#147: clarification of standard and correction of conformance doc: formula_terms
-----------------------------+------------------------------
Reporter: taylor13 | Owner: cf-conventions@…
Type: defect | Status: new
Priority: high | Milestone:
Component: cf-conventions | Version:
Resolution: | Keywords:
-----------------------------+------------------------------
Comment (by taylor13):
Dear Jonathan and all,
[THE FOLLOWING PARAGRAPH WAS INSERTED 1-DAY AFTER THE ORIGINAL COMMENT WAS
POSTED. PLEASE READ THIS PARAGRAPH AND COMMENT ON IT EVEN IF YOU DON'T
HAVE TIME TO READ THE REST OF THE COMMENT:
It occurs to me that there is a stop-gap measure we should take
immediately. Modify the conformance document such that it won't raise an
error if a formula_terms is attached to a variable that is not a
coordinate variable. I have reread the standard, and I can't see any
place where it specifically forbids using formula_terms outside the usage
discussed. Like other attributes, I would think this means it might also
be used in unorthodox ways without making a file inconsistent with the
standard. (For example, 'bounds' can only be expected to be interpreted
by software when attached to a coordinate variable, but, as David noted
and as Jonathan has done in practice, this does not forbid its use
elsewhere.) Similarly, I think we should be able to attach the
formula_terms attribute to to a cell bounds variable without raising an
error. So, I propose that the CF checker *not* raise an error in this
case. This could be done in time for the CF 1.7 release, I think, and
would make CMIP5 and CMIP6 data pass the CF checker's checks. I might
note that no one has complained about CMIP5 files being out of compliance
with CF, so I don't think there is any software out there that relies on a
restriction of formula_terms to coordinate variables. We didn't discover
the problem until late last year when we ran the CF checker on some CMIP5
files.
NOW BACK TO THE ORIGINAL POST:]
As you know, we are about to write petabytes of hopefully CF-compliant
CMIP6 data. There is an urgent need to agree on how to proceed on this
ticket. If possible, I would like to squeeze it into CF 1.7.
To summarize this ticket: Data stored on model levels for CMIP5 was non-
compliant with the standard because formula_terms was attached to the
variable providing bounds for the vertical parametric coordinate, and
currently CF forbids this. (A formula_terms can only be attached to a
coordinate variable.) We plan to include formula_terms for bounds in
CMIP6 too, so it will also be non-compliant unless we change the standard.
I proposed that formula_terms should be allowed to be attached to
variables containing bounds of coordinates as well as being attached to
variables containing the coordinates themselves.
You thought that was a good idea, but also wanted to go further and allow
the bounds to be attached to the (parameter) variables pointed to by
formula_terms, even though these variables cannot in general be considered
coordinates. You thought this was “implicitly allowed by section 7.1”.
But that section is introduced with:
“To represent cells we add the attribute bounds to the appropriate
coordinate variable(s). The value of bounds is the name of the variable
that contains the vertices of the cell boundaries. We refer to this type
of variable as a "boundary variable.” A boundary variable will have one
more dimension than its associated coordinate or auxiliary coordinate
variable.”
This would seem to explicitly rule out use of formula_terms with any
variable other than a coordinate variable (and the parameters appearing in
formula_terms aren’t generally coordinate variables).
So, I think that however we record the values of the parameters needed to
convert the bounds of a parametric coordinate to a vertical location in
physical space, we will have to modify the current convention.
You have also argued that “it's useful to have a direct link from the
formula terms to its bounds for some calculations, because otherwise you
have to search the file to find it.” Earlier you expanded on this:
“… I think it's useful. Although "a" [one of the parameters needed to
define hybrid sigma coordinates] is not a coordinate, you might wish to do
coordinate-like things with it. If I give "a" its bounds, it makes it
self-contained, which I feel is a naturally CF way to go. For instance, I
could hand the varid of "a" to a subroutine with the request to compute
the width of the intervals in "a", in just the same way as I might do with
eta in your example, or with sigma in the previous example. Under your
scheme, the subroutine won't be able to process "a", however, because
there is no pointer from "a" to eta, without which you can't find the
bounds of "a".”
You say you might want to compute the width of “a”, but I can’t think of
any reason to do that (I noted earlier that the so-called “width” can turn
out to be 0 for some parameters.) I can’t think of any use for operating
on the values of parameters at cell bounds other than to compute the
position in space of the vertical coordinate. I would note that both eta
and sigma are actual parametric coordinates, and it clearly is sometimes
useful to compute the width of coordinate cells.
In any case, I can see no added convenience of attaching a bounds
attribute to the parameters themselves, rather than to the variable
containing the bounds coordinate variable. When you come across a
variable that is a function of a parametric vertical coordinate, you would
presumably look at the formula_term to determine what “containers” needed
defining. At that time you could note whether or not there were bounds
defined for that parametric vertical coordinate, and if there were, you
could easily extract and associate the variables containing the parameter
values at the bounds of the vertical coordinate with the variables
containing the parameter values at the coordinate nodes. This would be
quite straight-forward I should think.
Note that I don’t think we can interpret the “value of the parameter at
the bounds of a parametric vertical coordinate’s grid cell” as the “bounds
of the parameter” because cells can’t intrinsically be defined by the
parameter. The cells are defined by the parametric vertical coordinate
(which therefore have bounds). Like other variables (e.g., temperature,
humidity, etc.) that can be defined both at the coordinate locations and
the cell bounds, the parameter values can be defined at both places. But
the cell (along with its bounds) is defined by the coordinate, not the
variables that are a function of that coordinate. Do you agree?
The reason I have been so forceful in arguing against your position is
that I think it requires us to redefine what we’ve meant by a “cell”. Up
to now, a cell has been defined by the bounds attached to a variable used
as a coordinate variable. This meant that the grid cell bounds would
always have values between the values of the two cells they separated.
The concept of a physical cell (like intervals on the number lines taught
us in elementary school) is easy to grasp. If we modify this simple
concept and allow bounds for the parameters associated with parametric
vertical coordinates, I think we make it much harder for novices to
understand what we’re talking about. How can the bounds defining
contiguous cells in 1 dimension not be monotonic? That is what would be
required if we allowed bounds be attached to parameters rather than
limiting their use to coordinates.
I guess if you still don’t see why I’m so opposed to allowing both
options, and there are no other opinions expressed, we have two choices:
1) We allow both options
2) We remain unable to reach consensus, and CMIP6, like CMIP5, will
produce non-CF-compliant files
I anxiously await your thoughts.
best wishes,
Karl
--
Ticket URL: <https://cf-pcmdi.llnl.gov/trac/ticket/147#comment:19>
CF Metadata <http://cf-convention.github.io/>
CF Metadata