Dear Jonathan, all, I think that a small extension to allow for vague notions of "representativity" could be valuable, for cases in which "mean", "median" etc would imply spurious precision. Perhaps there could be a standard way to point to a human-readable document describing the notion in more detail for that particular dataset. The alternative to this vague "representativeness" would probably be a full machine-readable description of how the value was arrived at, which would be very complex in many cases (although desirable if it could be achieved!)
The specific software problem is that CF represents temporal data using a syntax like "seconds since 1970". Time values are therefore double-precision numbers, with unknown real-world precision. Certainly it is an improvement to be able to say that the value in question is "representative" of the time range "t1 to t2 seconds since whenever". However, even this can imply spurious precision as Ken Casey has explained on this thread. An alternative might be to specify partial dates and times as ISO8601 strings, e.g. a time axis representing data that are representative of particular days could read "2010-02-01, 2010-02-02" etc. (This is somewhat related to ticket 14, which we decided against implementing: https://cf-pcmdi.llnl.gov/trac/ticket/14.) By the way, we also have a use case to represent some palaeoclimate data, which contains timeseries of data that are representative of months in the past: any more precision than this would be misleading. Such a time axis could be represented as "1200-01, 1200-02" etc. Cheers, Jon -----Original Message----- From: Jonathan Gregory [mailto:[email protected]] On Behalf Of Jonathan Gregory Sent: 02 June 2010 16:10 To: Jon Blower Cc: [email protected] Subject: Re: [CF-metadata] bounds/precision for time axis Dear Jon CF doesn't provide a way to do this except by giving bounds. I think that's the right thing to do, because the length of the interval alone doesn't say when it starts and stops, which applications may need to know. The cell_methods indicates how the value represents the variation within the interval. For an intensive quantity, "point" is the default i.e. instantaneous in time. To indicate a mean, cell_methods of "mean" should be specified. You are saying it is "representative" in some vaguer way than a mean, and it is not instantaneous. That sounds like a different cell_methods. Perhaps it would be a good idea to allow "cell" to be specified in cell_methods for intensive quantities, to indicate a "representative" value in this vague sense. ("cell" is the default cell_methods for an extensive quantity, which relates to the entire cell and depends on its size.) I think this vagueness should in general be discouraged; it would be better to be more precise and specify "mean", "median" etc., but if you can't be precise it'd be nice to be able to say so. What do you think? That would require a small change to the convention. Cheers Jonathan > We have many datasets for which we need to express the precision of the > time axis. For example, the OSTIA sea surface temperature dataset > contains daily fields. The data are considered "representative" of a > particular day, without necessarily being a simple average over the day. > At the moment the data are registered to 12:00Z on each day, but this is > indistinguishable from an instantaneous snapshot at this time. > > I guess it would be possible to express the temporal precision using the > "bounds" attribute for the variable in question > (http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.ht > ml#cell-boundaries), by specifying the start and end of each day as the > bounds. Is there a less verbose way of providing this information, > perhaps by stating the precision as "1 day/24 hours/whatever" as a > single attribute? > > Jon > > -- > Dr Jon Blower _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
