Even,
Thanks for the comments. That helps a lot, if I understand correctly now:
"input" alone - not modified
"dataset" alone - modified / is input and output
"input" with "dataset" alias - the input may be modified
Going back through with that in mind, it looks largely consistent with
a small number of exceptions. Starting with "raster edit", it has only
"dataset" which is input and output, then...
"raster info"
-i, --dataset, --input <INPUT>
Makes sense, input may be modified, e.g., if --stats are computed they
may be set back on the input dataset.
"vector info"
-i, --dataset, --input <INPUT>
Same, input may be modified if --sql is given for UPDATE, DELETE, etc.
Exceptions:
"vector sql"
-i, --input <INPUT>
Should also have "dataset" alias?
Starting with GDAL 3.12, when using --update, and without an output dataset
specified, this can be used to execute statements that modify the input
dataset, such as UPDATE, DELETE, etc.
"gdal info"
-i, --input <INPUT>
Add "dataset" alias to align with "raster info" and "vector info"?
(and "mdim info" also has "dataset" alias)
Then these three probably should be aligned:
"raster overview add"
-i, --dataset, --input <INPUT> Dataset (to be updated in-place,
unless --external)
"raster overview delete"
--dataset <DATASET> Dataset (to be updated in-place, unless --read-only)
"raster overview refresh"
--dataset <DATASET> Dataset (to be updated in-place, unless --external)
This one has "dataset" alias but does not modify input, seems to be a
lone exception in that regard:
"raster update"
-i, --dataset, --input <INPUT>
-o, --output <OUTPUT>
I think "dataset" should be kept based on that understanding.
I don't have strong feelings on the <meta_var> names other than
consistency is good. The existing "inconsistencies" are pretty minor
so not sure if changes are really needed.
The aim is not to be super nitpicky over arg names and their aliases.
Motivation is API usage where application code takes dataset objects
as user input to CLI algorithms. The dataset objects may carry
information that should be used to parameterize the algorithm call.
The user may have already set properties on the object and should not
have to provide those explicitly again when passing to an algorithm.
The algorithm arg names must be used in parsing in some cases to infer
meaning, i.e., we cannot always rely only on querying properties of
the AlgorithmArg object. An example is "like" / "like-layer" /
"like-sql" / "like-where". Parsing for those needs to rely on the arg
names (and any potential aliases) for meaning. So the names/aliases
really should be always consistent, which I believe they are in that
case.
"input" and "dataset" aren't quite the same since we can query the
AlgorithmArg object and determine what it's for. But the more
consistent they can be in usage the better IMO.
Chris
On Sun, Apr 5, 2026 at 4:21 AM Even Rouault <[email protected]> wrote:
Hi Chris,
The main issue is the occasional use of "dataset" as an alias for
"input". It's inconsistently available as an alias which seems not
ideal, but it also shows up in unexpected ways.
"raster edit" has only --dataset with no --input or -i:
--dataset <DATASET>
The rationale was that raster edit only takes a single dataset which is
both input and output. Input could also suggest that it won't be
modified, which is not the case here. But I see we have hesitated in
different similar (or similar looking, but subtely different) situations
if we needed to expose input, dataset or both.
"raster overview add" has:
-i, --dataset, --input <INPUT>
But "raster overview delete" and "raster overview refresh" have only:
--dataset <DATASET>
A dataset-specific one "dataset check" doesn't use it:
-i, --input <INPUT>
For dataset check, the dataset isn't modified.
Is the "dataset" alias really worth having?
Good question. Happy to hear about other's opinion on this.
A couple others are unique cases that may not be a problem. These just
stand out as different since meta_var rarely deviates from the naming
pattern.
"raster calc" has:
-i, --input <INPUTS>
Plural to suggest you can specify several ones
"raster blend" has:
-i, --color-input, --input <COLOR-INPUT>
The metavar is important to remind the semantics because it accepts a
second input dataset : --overlay <OVERLAY>
Those are the only cases I've found where the meta_var name is
different than the long name. Nearly all have <INPUT> for --input even
if there is an alias, e.g., "raster pansharpen" has `-i,
--panchromatic, --input <INPUT>`.
Similar to raster blend:
-i, --panchromatic, --input <INPUT> Input panchromatic raster dataset
[required]
--spectral <SPECTRAL> Input spectral band dataset [1.. values] [required]
The input name helps here to remember which dataset is implicit or not
when you use it in a pipeline context (input must not be specified as
the result of the previous step):
gdal raster pipeline read panchro.tif ! pansharpen multispectral.tif !
write out.tif
I checked several others that can
take multiple input datasets, and "raster calc" is the only one I
found with plural INPUTS. Maybe that's not a big deal because the
meta_var is only for display in the documentation? It could still be
worth making them consistent for readability.
Yes we could put plural INPUTS in other situations where input accepts
multiple files
Since <INPUT> is almost
always used for the positional input dataset(s), when I see
<COLOR-INPUT> it looks like possibly something other than a raster
dataset.
Should the metavar of INPUT be INPUT-DATASET in general case, and
COLOR-INPUT-DATASET for blend / PANCHRO-DATASET for pansharpen ?
--
http://www.spatialys.com
My software is free, but my time generally not.