Jonathan,

I think we are getting closer to each other, bit by bit (yay!).

My responses are interspersed below.

On 4/29/15 9:39 AM, Jonathan Gregory wrote:
Dear Jim, Chris et al.

I'm using the word "calendar" in a CF-consistent way, I believe. Maybe it's not
the best word for the concept, but nonetheless we have an attribute called
"calendar", and its sole function is indicate which algorithm is used to
translate between components of time (YMDhms) and elapsed time (in units of
time since a reference time). So perhaps that is consistent with "calendar"
being a collection of algorithms, in Chris's text, but it's more specific
than that. It has a particular function in the interpretation of CF time
values (usually coordinates). CF sect 4.4.1 says
"In order to calculate a new date and time given a base date, base time and a
time increment one must know what calendar to use."
and I think that is the sense in which I am using "calendar".
I agree that this is a CF-consistent usage of the word calendar, but it runs against natural usage, and I think it's worth keeping that in mind.
In the "real" world, we often start with UTC timestamps that have
leap seconds accounted for, yet convert them to elapsed times using
calculators that don't account for leap seconds. This can actually
lead to elapsed time values that encode a time discontinuity and
cannot be counted on to produce accurate differences between every
pair of values.
This is a problem, I agree. We should avoid that problem for future data by
making the conventions more precise about which calculator should be used
(which calendar, in the CF sense). We can't decide for sure what was done when
encoding past data, but the conventions string records the version of CF used.

   Calendars and calculators are different things. A calendar or time
   system is (per Chris) a collection of algorithms. I started using
   the term calculator because it was shorter and more generic than
   "time handling module or library". A calculator is a particular
   implementation of a set of algorithms. This is why I ended up
   avoiding the use of the word clock as a name for the group of
   algorithms that make up a time system. A clock is a device that is
   an implementation of a set of algorithms.


I agree that we should make the documentation more precise, and warn people of the potential pitfalls of using a calculator that doesn't recognize leap seconds to create time variables from timestamps that include leap seconds (like UTC-based timestamps).
I'm suggesting that we need to do two things. One is to more precisely
define what sorts of times can be used in the time reference part of
the units attribute. I just reread section 4.4, and it actually says
that the time is UTC or a time zone offset from it. I think it
should stay that way and the wording strengthened to make it
clearer.
Yes, it does say that. It's a quote from the udunits man page. However I don't
think the issue of leap seconds has been carefully considered before, so we
don't have to assume that's what it meant exactly, especially as udunits does
not support lead seconds. As previously said, and I think you may agree, it is
likely that nearly all existing time values have been encoded *without* leap
seconds, and therefore *not* UTC strictly. Therefore my alternative suggestion
is that we should add some text here that says we don't necessarily imply
leap seconds are included by mentioning UTC. This must be the case, because
the same format of time unit is used for calendars that definitely do not ever
include leap seconds i.e. all the non-real-world ones. UTC is mentioned simply
as a way to refer to the time-zone which contains the Greenwich meridian,
without summer time.

I agree that leap seconds haven't been carefully considered before. I disagree that nearly all existing time values have been encoded without leap seconds. I'd say that nearly all existing time values that were derived from true UTC timestamps are at risk of having leap second discontinuities encoded into the set of values. (See my previous response below.)

There are three issues here, so let's not conflate them. They are:

1. What to call the time system that is like UTC in overall form
   (Greenwich meridian, etc) but doesn't include leap seconds.
2. How to indicate which actual time system is being used for the time
   part of the reference time in the units attribute.
3. How to indicate whether or not the elapsed times in the time
   variable are certain to be free of leap second induced discontinuities.

The other thing I think we need to do is provide a way to indicate
that the elapsed times in a time variable are true elapsed times
that are certain to be free of leap second discontinuities, or are
possibly contaminated with leap second discontinuities. In
connection with this we would need to add clarifying language in the
CF conventions to educate people on the importance of using time
calculators that are aware of leap seconds when moving between UTC
timestamps and elapsed time values. This could be handled by adding
a modifier to a calendar name in the calendar attribute, or it could
be handled by adding a new attribute to hold this information. I
think that coming up with one or more new calendar names is a more
confusing and less useful way to accomplish this.
I don't think we should define a new attribute, because this distinction is
one which applies only to the real-world calendar. It's therefore more robust
and simpler to indicate it as a modifier to the name when applicable in the
the calendar att, so making a new calendar name, in effect. But given this
discussion I agree that calling it just "UTC" may not be clear enough. The
real-world calendar is called "standard" or "gregorian". I would propose a new
possibility "proleptic_gregorian_utc", meaning the proleptic Gregorian
calendar with leap seconds inserted as applicable since 1958.
Adding a calendar such as "gregorian_utc" and redefining the definition of "gregorian" is insufficient to address all of the issues. As far as calendars and time systems go, we have two options:

 * <whatever calendar> without leap seconds
 * <whatever calendar> with UTC leap seconds

By the way, here's a pretty exhaustive discussion of all the different time systems.

http://www.ucolick.org/~sla/leapsecs/timescales.html

The best option I have come up with for naming the time system without leap seconds is "traditional". There doesn't seem to be a name for the "60 seconds in a minute, 60 minutes in an hour, 24 hours in a day" time system. The word traditional is used by the Bureau International des Poids et Mesures in their description of the non-SI units of minutes, hours, and days that are accepted for use with SI units.

So we could answer issue 1. by calling the time system that's like UTC but without leap seconds the "traditional" time system. We could also use the name "POSIX", and that is kind of evocative since it reflects the potential for glitches, but POSIX also defines an epoch date & time which this time system lacks. I'd be good either way though.

We could answer issue 2. by defining the existing calendars as including the "traditional" time system and adding more calendars with "_utc" tacked on to original names, but this glosses over the fact that we actually can't be sure what we have in existing datasets. We could instead revise the definition of the "calendar" attribute to include an optional space-separated modifier that would name the time system used for the time part of the reference date & time. So instead of adding new calendars, we would allow values of the calendar attribute to be

   calendar = "<calendar> [<time system>]"

where the '<>' indicate placeholders for the things named within them, and the '[]' indicate optional inclusion. The values for the <time system> modifer would be

 * "unknown" - This is the default if no time system modifier is
   specified. Users should be aware that the reference time in the
   units attribute may or may not be based on a system (such as UTC)
   that includes leap seconds.
 * "traditional" - This indicates that the reference time in the units
   attribute is based on a traditional "60 seconds in a minute, 60
   minutes in an hour, 24 hours in a day, base time zone at the
   Greenwich meridian" time system. No leap seconds are included in
   this time system. (Like I said, I'd be OK with calling this "posix"
   instead.)
 * "utc" - This indicates that the reference time in the units
   attribute is based on Coordinated Universal Time (UTC), which
   includes adjustments and leap seconds after 1958.

So, you could have calendar attribute combinations such as "gregorian utc" or "gregorian traditional", and "proleptic_gregorian" would indicate that there is uncertainty (which there is for all existing data sets) as to which time system was used. You could define the modifier as only applying to the "real world" calendars to prevent weird combinations such as "noleap utc". I think that this is a better way to handle this within the calendar attribute.

That leaves issue 3. The solutions to the first two issues don't address this issue. This is where the difference between a time system or calendar and a calculator come up.

We could add a further modifier in the calendar attribute with three possible values:

 * "true_elapsed" - the elapsed time values are certain to be free of
   leap second discontinuities
 * "false_elapsed" - the elapsed time values will include one or more
   leap second discontinuities if any leap second application dates
   were included in the time span from the reference time in the units
   attribute to any of the time values.
 * "unknown" - the elapsed time values might include leap second
   discontinuities, but it is not known if they do or not. (the default)

This extends the calendar attribute definition to be

   calendar = "<calendar> [<time system>] [<elapsed time type>]"

We could do it this way. I tend to think that the last one might be better handled as a separate attribute, but I'd be OK with this approach.

Grace and peace,

Jim
Best wishes

Jonathan
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc>         *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: [email protected] <mailto:[email protected]>
o: +1 828 271 4900

/We will be updating our social media soon. Follow our current Facebook (NOAA National Climatic Data Center <https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA National Oceanographic Data Center <https://www.facebook.com/noaa.nodc>) and Twitter (@NOAANCDC <https://twitter.com/NOAANCDC> and @NOAAOceanData <https://twitter.com/NOAAOceanData>) accounts for the latest information./


_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to