I support Mark's view and the current two options for specifying "scalar" coordinates. I think that although the concept of a "scalar" coordinate may take some time to explain to data providers, I don't think it's difficult to describe the two options (Ken's example makes it quite clear).

Karl Taylor

On 5/23/13 2:04 AM, Hedley, Mark wrote:
I can appreciate Ken's point of view, particularly where the convenience is 
like the example Jonathan posted of having one scalar coordinate, there's not 
much added by encoding this as a scalar.

However I support Jonathan's statement about omitting size one dimensions, 
particularly where there are a number.  It is often helpful not to encode 
information about the ordering of dimensions of size one.

In addition to this, I am keen to make sure that data creators are able to 
encode metadata where the dependency relationship between multiple scalars is 
not encoded.  In numerous cases, the relationship is not uniquely defined and 
should not be encoded.

For example a data creator may have a set of descriptors and identifiers which 
define how a particular data set is defined with respect to a larger study, 
such as a multi-model analysis meta-experiment.  Each data set is produced by a 
model with a collection of scalar coordinates from that model run, e.g. 
'ensemble member number', 'experiment id', 'perturbation scheme', 'forcing 
parameter a', forcing parameter b', ... etc.

Given different collections of such data sets, different relationships may 
emerge from the collection, enabling different types of analysis. It is a 
really useful approach to encode all these quantities as scalars, and interpret 
these scalars as potential degrees of freedom with potential inter-relations; 
the degrees of freedom and inter-relationships are then emergent properties of 
the collection, not defined in any individual member data set.

This seems to me to be the logical conclusion of the use of the term 'scalar 
coordinate' (as contrasted to 'vector coordinate', i assume) and it is really 
useful.

I think it would be a regressive step for CF to limit or complicate this facet 
of data comprehension by constraining the meaning of 'scalar coordinate' in the 
way that has been suggested.  I don't think such a constraint has been clear up 
to now in the conventions and people are making real use of the perceived 
flexibility.

mark
________________________________________
From: CF-metadata [[email protected]] on behalf of Jonathan 
Gregory [[email protected]]
Sent: 21 May 2013 18:42
To: [email protected]
Subject: Re: [CF-metadata] scalar coordinates

Dear Ken

Your argument is one that we should not have scalar coordinates at all. You
could propose that they should be removed from the next version of CF (and
that would certainly simplify this discussion, if agreed :-). They have been
in there for several years now, and I guess they are quite widely used,
because it is convenient for data-writers. I agree, it does require a bit
more work for data-readers. However, CF-compliant software should expect to
inspect the coordinates attribute in any case, and if it does that it will
automatically come across the scalar coordinate variables.

I think that an attractive feature of omitting the size-one dimensions is
not having to decide on the order of them in the data variable, which is
really arbitrary for storage in netCDF files, since it makes no difference
to the order of the data elements.

Best wishes

Jonathan


----- Forwarded message from "Kenneth S. Casey - NOAA Federal" 
<[email protected]> -----

From: "Kenneth S. Casey - NOAA Federal" <[email protected]>
Date: Tue, 21 May 2013 07:22:34 -0400
To: "Hedley, Mark" <[email protected]>
X-Mailer: Apple Mail (2.1503)
CC: "[email protected]" <[email protected]>
Subject: Re: [CF-metadata] scalar coordinates

Hi Everyone,

After spending 15 minutes reading this and Jonathan's previous post, and trying hard to 
be sure I really understand them, I am left wondering if a "convenience 
feature" is really a convenience at all.  I know programmers don't like clutter in 
their code and have certain aesthetic to uphold (I used to do much more serious 
programming in my younger days, but am now just a lazy Matlaber), but is the added 
complexity of having these two options, copied from Jonathan's post:

  float height; // scalar coordinate variable
    height: standard_name="height";
  float temp(lat,lon);
    temp: standard_name="air_temperature";
    temp: coordinates="height";

  float height(height);  // size-one coordinate var, with dimension height=1
    height: standard_name="height";
  float temp(height,lat,lon);
    temp: standard_name="air_temperature";

really worth it?  The manual must be longer to describe the "convenience", and 
the application programmers and downstream users of the data now have to build complexity 
into THEIR code to handle both cases. This list has to take time explaining and debating 
the options, and I have to take time explaining it to both data producers and data 
consumers who have never used netCDF or CF before.   Maybe that is not such a big deal on 
a case by case basis, but in the broader picture when folks like me are trying to get 
everyone and their brothers and sisters and mothers and fathers using CF-netCDF, it is a 
real pain.  My vote is to simplify. Use the second example, since it handles the case 
where height=1 as easily as the case where height=1000.  I say be explicit.  Does the 
first example gain us any functionality?  If not, then don't use it and stay away from 
it.  Encourage others to do the same.

Ken


On May 21, 2013, at 5:48 AM, "Hedley, Mark" <[email protected]> 
wrote:

The term 'convenience feature' is mentioned in the conventions document:

  'The new scalar coordinate variable is a convenience feature which avoids 
adding size one dimensions to variables.'

Data creators have seen the benefits in not encoding size one dimensions and 
made use of this feature, it has proved very convenient.  The conventions go on 
to say:

  'Scalar coordinate variables have the same information content and can be 
used in the same contexts as a size one coordinate variable.'

But this statement is not quite true: the ordering of dimensions is not 
encoded, and the ability to link many coordinates to the same dimension is 
lost.  The assumption in this statement is an aspiration which I think cannot 
be delivered without particularly strict limitations on the use of scalars 
during encoding.

Nowhere in the conventions does it state that if more than one single-valued 
coordinate is related to the same degree of freedom, a dimension must be 
declared for these and this relationship explicitly encoded.

Later, the case of character strings is addressed:

  'If a character variable has only one dimension (the maximum length of the 
string), it is regarded as a string-valued scalar coordinate variable, 
analogous to a numeric scalar coordinate variable (see Section 5.7, ?Scalar 
Coordinate Variables?) '

which is a required feature, but the NUG only allows numerical valued data 
arrays as Coordinate Variables, so a further section is added, in the  
Terminology:

  'scalar coordinate variable
    A scalar variable that contains coordinate data. Functionally equivalent to 
either a size one coordinate variable or a size one auxiliary coordinate 
variable. '

These statements together provide information on how to write files, but they 
are limited in their assistance to file reading and interpretation.

The conventions are not clear how to, or whether to make a distinction for a 
particular scalar coordinate: it does not say that a scalar coordinate is a 
Coordinate Variable or an Auxiliary Coordinate Variable, it says it is 
functionally equivalent to either one or the other.

I have read these sections to mean that by encoding a scalar coordinate the 
data creator is not providing information about how the coordinate is related 
to the dimensions in the file, other than to say it applies to all of the cells 
currently in the file.

As such, I disagree with the statement that that

  'Scalar coordinate variables have the same information content and can be 
used in the same contexts as a size one coordinate variable.'

In many cases this will turn out to be a valid interpretation but it is not the 
only one, and this nuance is a really useful feature, which many data creators 
have benefited from.

 From one point of view, a third type of Coordinate exists in CF, the Scalar 
Coordinate, which is neither a Coordinate Variable, nor an Auxiliary 
Coordinate.  From another point of view a Scalar Coordinate is an Auxiliary 
Coordinates which has the potential to be an emergent Coordinate Variable, if 
required and consistent for the data consumer.  (I am sure there are other 
useful perspectives we can consider)

We have come across many data sets from other data creators where a considered 
reading of the data suggests that they have taken an interpretation such as 
this as well.  No distinction has been made between scalars which represent a 
degree of freedom and scalars which do not.

The scalar coordinate is a convenient feature allowing metadata to be simply 
encoded in a clear manner and I feel that the conventions document should adapt 
to reflect the usage some sections of the community have adopted.  It is not 
ambiguous, it provides sufficient information to work with the file and the 
data and metadata are well specified.

Indeed when converting from other formats (such as GRIB and BUFR) to CF it is 
the logical way to encode the available metadata.

I am concerned about the implications for these data sets if the interpretation 
of scalar coordinates is tightened in a future version of the conventions 
document to explicitly disallow this useful and well used point of view.  I 
would like to stress again Jonathan's point, that all of this data is CF 
compliant, the question is how consumers interpret the semantics of the data 
set.

I think the utility of the scalar coordinate variable is significantly 
diminished if Option B or some derivative of it is pursued for the next version 
of the conventions.  Option A preserves all of the interpretations of Option B 
intact, but with caution needed on loading and interpretation not to read too 
much information into any scalar coordinates present.

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Kenneth S. Casey, Ph.D.
Technical Director
NOAA National Oceanographic Data Center
1315 East-West Highway
Silver Spring MD 20910
301-713-3272 x133
http://www.nodc.noaa.gov



_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to