All,
I'm for option B, though I might be persuaded to go for option A given a
compelling counter-example. The example that has been given regarding
forecast times seems out of step with common CF practice in the
utilization of CF "forecast run aggregations". That context recognizes
forecast output collections as 5-dimensional datasets -- both the
calendar date of the forecast time step, and the run date of the model
are valid time coordinates. Ambiguity is not desirable in this case.
It is important to be able to traverse the collection along both types
of time axis. (scroll to section 4 at
http://www.unidata.ucar.edu/software/netcdf/ncml/v2.2/FmrcAggregation.html)
A single forecast file, lifted from the context of the collection,
really does have two distinct types of degenerate time axes, reflecting
its position in a 5-dimentnsional conceptual space. The CF file should
not be implying that there is an arbitrary either-or choice between two
dates; it should make clear the semantic distinctions between the two.
- Steve
==========================================================
On 5/10/2013 7:20 AM, Seth McGinnis wrote:
I'll agree with option A.
I can think of a number of cases where scalar coordinate variables
are a convenient way to record metadata about the positioning
of the data in space-time, but it's not like the data at other
positions actually exists and isn't recorded in this file; it's just a
way of formatting the metadata. Which makes it a bit weird to
insist that there's always a degenerate dimension associated
with the scalar coordinate.
Consider surface observations. A scalar coordinate is a sensible
way to record e.g. the height of the observation (2-m screen
height for temps & humidity vs 10-m anemometer height for winds),
but it's not as if there's an entire spectrum of different heights
for the observations that you're sampling from; those heights are
the only ones that there were or ever will be.
So I can't see any utility in requiring the height to be treated as a
dimension in that case. But there is some potential disutility, in that
if you've got software that slices and dices the data along different
dimensions, adding in a degenerate dimension for the height is
likely to just clutter things up and confuse the issue.
Cheers,
--Seth
On Fri, 10 May 2013 08:56:40 +0000
"Hattersley, Richard" <[email protected]> wrote:
Perhaps it might be helpful to add some context, i.e. "Why do I care?"
My understanding is, Jonathan Gregory and Mark Hedley intended to
resolve this ambiguity in a subsequent revision of CF. And that
resolution will have an impact on both data producers and data
consumers.
As a data producer you might care because you're producing data which
will become invalid. As a data consumer you might find that software
tools interpret data differently, and hence you might have to change
your code.
The question is this: "Does a Scalar Coordinate Variable....":
Option A: Represent either a Coordinate Variable or an Auxiliary
Coordinate? The presence of a scalar does not mandate the existence of
a new dimension; it can imply an undeclared dimension of size one
that is not explicitly defined in the file but it does not have to.
Or
Option B: Always represent a Coordinate variable which explicitly
declares a dimension of size one, where this dimension is not stated
in the file? An exception is provided for string scalar coordinate
variables only, which are defined as Auxiliary Coordinates but also
mandate a new dimension of size one.
It seems the difference hinges around the concept of "degrees of
freedom". In those terms...
Option A lets the data producer say, "Here are some scalar pieces of
metadata - data consumers can choose what to do with them."
Whereas option B implies, "These are the degrees of freedom - no more,
no less."
One impact of this is in the overdetermined case of time,
forecast_reference_time, and forecast_period. Even when a data variable
contains data for a single point in time, option B would require the
*producer* to decide which two variables describe the two degrees of
freedom, and which variable is the dependent variable.
But as a consumer I might choose to aggregate a collection of these
single-time-point data variables which are best parameterised by a
*different* pair of time, forecast_reference_time, or forecast_period.
In general, it's not possible for the data producer to know in advance
which two variables best parameterise the collection I'm interested in.
For this, and other related reasons involving ensembles, I'm in favour
of option A.
Richard Hattersley
Iris Benevolent Dictator
Met Office
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata