Scalar coordinates are not just a
convenience, they are the clearest way to locate
the data in time and space, in some cases - as the CF document
(5.7) says, sometimes
'there is no associate d
dimension.'
I'm truly confused by Mark's statement:
'Scalar coordinate variables have the
same information content and can be used in the same contexts as
a size one coordinate variable.'
But this statement is not quite true: the ordering of dimensions
is not encoded, and the ability to link many coordinates to the
same dimension is lost. The assumption in this statement is an
aspiration which I think cannot be delivered without
particularly strict limitations on the use of scalars during
encoding.
Nowhere in the conventions does it state that if more than one
single-valued coordinate is related to the same degree of
freedom, a dimension must be declared for these and this
relationship explicitly encoded.
First, what do you mean by the ordering of dimensions of size 1?
And, can you please elaborate on the last sentence - is the goal
to indicate that there's
a relationship between sets of scalar coordinates that represent
the same axis? Don't we
provide that by using the 'axis' attribute? Is there some other
rationale for this?
For our meteorology time series data, we have singleton latitude
and longitude, unlimited
time, and numerous sensor heights. We provide time as a dimension
and the others as
scalar coordinates. I don't think we would gain clarity by
creating a height dimension
for each instrument's Z position. Our data is really a 1
dimensional array where the
Z coordinate is slightly different for each variable, and that's
what we code it as.
Cheers - Nan
On 5/21/13 7:22 AM, Kenneth S. Casey - NOAA Federal wrote:
Hi Everyone,
After spending 15 minutes reading this and Jonathan's
previous post, and trying hard to be sure I really understand
them, I am left wondering if a "convenience feature" is really a
convenience at all. I know programmers don't like clutter in
their code and have certain aesthetic to uphold (I used to do
much more serious programming in my younger days, but am now
just a lazy Matlaber), but is the added complexity of having
these two options, copied from Jonathan's post:
float height; // scalar coordinate variable
height: standard_name="height";
float temp(lat,lon);
temp: standard_name="air_temperature";
temp: coordinates="height";
float height(height); // size-one coordinate var, with
dimension height=1
height: standard_name="height";
float temp(height,lat,lon);
temp: standard_name="air_temperature";
really worth it? The manual must be longer to describe the
"convenience", and the application programmers and downstream
users of the data now have to build complexity into THEIR code
to handle both cases. This list has to take time explaining and
debating the options, and I have to take time explaining it to
both data producers and data consumers who have never used
netCDF or CF before. Maybe that is not such a big deal on a
case by case basis, but in the broader picture when folks like
me are trying to get everyone and their brothers and sisters and
mothers and fathers using CF-netCDF, it is a real pain. My vote
is to simplify. Use the second example, since it handles the
case where height=1 as easily as the case where height=1000. I
say be explicit. Does the first example gain us any
functionality? If not, then don't use it and stay away from it.
Encourage others to do the same.
Ken
The term 'convenience feature' is
mentioned in the conventions document:
'The new scalar coordinate variable is a convenience
feature which avoids adding size one dimensions to
variables.'
Data creators have seen the benefits in not encoding size
one dimensions and made use of this feature, it has proved
very convenient. The conventions go on to say:
'Scalar coordinate variables have the same information
content and can be used in the same contexts as a size one
coordinate variable.'
But this statement is not quite true: the ordering of
dimensions is not encoded, and the ability to link many
coordinates to the same dimension is lost. The assumption
in this statement is an aspiration which I think cannot be
delivered without particularly strict limitations on the use
of scalars during encoding.
Nowhere in the conventions does it state that if more than
one single-valued coordinate is related to the same degree
of freedom, a dimension must be declared for these and this
relationship explicitly encoded.
Later, the case of character strings is addressed:
'If a character variable has only one dimension (the
maximum length of the string), it is regarded as a
string-valued scalar coordinate variable, analogous to a
numeric scalar coordinate variable (see Section 5.7, “Scalar
Coordinate Variables”) '
which is a required feature, but the NUG only allows
numerical valued data arrays as Coordinate Variables, so a
further section is added, in the Terminology:
'scalar coordinate variable
A scalar variable that contains coordinate data.
Functionally equivalent to either a size one coordinate
variable or a size one auxiliary coordinate variable. '
These statements together provide information on how to
write files, but they are limited in their assistance to
file reading and interpretation.
The conventions are not clear how to, or whether to make a
distinction for a particular scalar coordinate: it does not
say that a scalar coordinate is a Coordinate Variable or an
Auxiliary Coordinate Variable, it says it is functionally
equivalent to either one or the other.
I have read these sections to mean that by encoding a scalar
coordinate the data creator is not providing information
about how the coordinate is related to the dimensions in the
file, other than to say it applies to all of the cells
currently in the file.
As such, I disagree with the statement that that
'Scalar coordinate variables have the same information
content and can be used in the same contexts as a size one
coordinate variable.'
In many cases this will turn out to be a valid
interpretation but it is not the only one, and this nuance
is a really useful feature, which many data creators have
benefited from.
From one point of view, a third type of Coordinate exists in
CF, the Scalar Coordinate, which is neither a Coordinate
Variable, nor an Auxiliary Coordinate. From another point
of view a Scalar Coordinate is an Auxiliary Coordinates
which has the potential to be an emergent Coordinate
Variable, if required and consistent for the data consumer.
(I am sure there are other useful perspectives we can
consider)
We have come across many data sets from other data creators
where a considered reading of the data suggests that they
have taken an interpretation such as this as well. No
distinction has been made between scalars which represent a
degree of freedom and scalars which do not.
The scalar coordinate is a convenient feature allowing
metadata to be simply encoded in a clear manner and I feel
that the conventions document should adapt to reflect the
usage some sections of the community have adopted. It is
not ambiguous, it provides sufficient information to work
with the file and the data and metadata are well specified.
Indeed when converting from other formats (such as GRIB and
BUFR) to CF it is the logical way to encode the available
metadata.
I am concerned about the implications for these data sets if
the interpretation of scalar coordinates is tightened in a
future version of the conventions document to explicitly
disallow this useful and well used point of view. I would
like to stress again Jonathan's point, that all of this data
is CF compliant, the question is how consumers interpret the
semantics of the data set.
I think the utility of the scalar coordinate variable is
significantly diminished if Option B or some derivative of
it is pursued for the next version of the conventions.
Option A preserves all of the interpretations of Option B
intact, but with caution needed on loading and
interpretation not to read too much information into any
scalar coordinates present.
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Kenneth S. Casey, Ph.D.
Technical Director
NOAA National Oceanographic Data Center
1315 East-West Highway
Silver Spring MD 20910
301-713-3272 x133
http://www.nodc.noaa.gov
 
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
*******************************************************
* Nan Galbraith Information Systems Specialist *
* Upper Ocean Processes Group Mail Stop 29 *
* Woods Hole Oceanographic Institution *
* Woods Hole, MA 02543 (508) 289-2444 *
*******************************************************
|