Jonathan,
I think we are getting closer to each other, bit by bit (yay!).
My responses are interspersed below.
On 4/29/15 9:39 AM, Jonathan Gregory wrote:
Dear Jim, Chris et al.
I'm using the word "calendar" in a CF-consistent way, I believe. Maybe it's not
the best word for the concept, but nonetheless we have an attribute called
"calendar", and its sole function is indicate which algorithm is used to
translate between components of time (YMDhms) and elapsed time (in units of
time since a reference time). So perhaps that is consistent with "calendar"
being a collection of algorithms, in Chris's text, but it's more specific
than that. It has a particular function in the interpretation of CF time
values (usually coordinates). CF sect 4.4.1 says
"In order to calculate a new date and time given a base date, base time and a
time increment one must know what calendar to use."
and I think that is the sense in which I am using "calendar".
I agree that this is a CF-consistent usage of the word calendar, but it
runs against natural usage, and I think it's worth keeping that in mind.
In the "real" world, we often start with UTC timestamps that have
leap seconds accounted for, yet convert them to elapsed times using
calculators that don't account for leap seconds. This can actually
lead to elapsed time values that encode a time discontinuity and
cannot be counted on to produce accurate differences between every
pair of values.
This is a problem, I agree. We should avoid that problem for future data by
making the conventions more precise about which calculator should be used
(which calendar, in the CF sense). We can't decide for sure what was done when
encoding past data, but the conventions string records the version of CF used.
Calendars and calculators are different things. A calendar or time
system is (per Chris) a collection of algorithms. I started using
the term calculator because it was shorter and more generic than
"time handling module or library". A calculator is a particular
implementation of a set of algorithms. This is why I ended up
avoiding the use of the word clock as a name for the group of
algorithms that make up a time system. A clock is a device that is
an implementation of a set of algorithms.
I agree that we should make the documentation more precise, and warn
people of the potential pitfalls of using a calculator that doesn't
recognize leap seconds to create time variables from timestamps that
include leap seconds (like UTC-based timestamps).
I'm suggesting that we need to do two things. One is to more precisely
define what sorts of times can be used in the time reference part of
the units attribute. I just reread section 4.4, and it actually says
that the time is UTC or a time zone offset from it. I think it
should stay that way and the wording strengthened to make it
clearer.
Yes, it does say that. It's a quote from the udunits man page. However I don't
think the issue of leap seconds has been carefully considered before, so we
don't have to assume that's what it meant exactly, especially as udunits does
not support lead seconds. As previously said, and I think you may agree, it is
likely that nearly all existing time values have been encoded *without* leap
seconds, and therefore *not* UTC strictly. Therefore my alternative suggestion
is that we should add some text here that says we don't necessarily imply
leap seconds are included by mentioning UTC. This must be the case, because
the same format of time unit is used for calendars that definitely do not ever
include leap seconds i.e. all the non-real-world ones. UTC is mentioned simply
as a way to refer to the time-zone which contains the Greenwich meridian,
without summer time.
I agree that leap seconds haven't been carefully considered before. I
disagree that nearly all existing time values have been encoded without
leap seconds. I'd say that nearly all existing time values that were
derived from true UTC timestamps are at risk of having leap second
discontinuities encoded into the set of values. (See my previous
response below.)
There are three issues here, so let's not conflate them. They are:
1. What to call the time system that is like UTC in overall form
(Greenwich meridian, etc) but doesn't include leap seconds.
2. How to indicate which actual time system is being used for the time
part of the reference time in the units attribute.
3. How to indicate whether or not the elapsed times in the time
variable are certain to be free of leap second induced discontinuities.
The other thing I think we need to do is provide a way to indicate
that the elapsed times in a time variable are true elapsed times
that are certain to be free of leap second discontinuities, or are
possibly contaminated with leap second discontinuities. In
connection with this we would need to add clarifying language in the
CF conventions to educate people on the importance of using time
calculators that are aware of leap seconds when moving between UTC
timestamps and elapsed time values. This could be handled by adding
a modifier to a calendar name in the calendar attribute, or it could
be handled by adding a new attribute to hold this information. I
think that coming up with one or more new calendar names is a more
confusing and less useful way to accomplish this.
I don't think we should define a new attribute, because this distinction is
one which applies only to the real-world calendar. It's therefore more robust
and simpler to indicate it as a modifier to the name when applicable in the
the calendar att, so making a new calendar name, in effect. But given this
discussion I agree that calling it just "UTC" may not be clear enough. The
real-world calendar is called "standard" or "gregorian". I would propose a new
possibility "proleptic_gregorian_utc", meaning the proleptic Gregorian
calendar with leap seconds inserted as applicable since 1958.
Adding a calendar such as "gregorian_utc" and redefining the definition
of "gregorian" is insufficient to address all of the issues. As far as
calendars and time systems go, we have two options:
* <whatever calendar> without leap seconds
* <whatever calendar> with UTC leap seconds
By the way, here's a pretty exhaustive discussion of all the different
time systems.
http://www.ucolick.org/~sla/leapsecs/timescales.html
The best option I have come up with for naming the time system without
leap seconds is "traditional". There doesn't seem to be a name for the
"60 seconds in a minute, 60 minutes in an hour, 24 hours in a day" time
system. The word traditional is used by the Bureau International des
Poids et Mesures in their description of the non-SI units of minutes,
hours, and days that are accepted for use with SI units.
So we could answer issue 1. by calling the time system that's like UTC
but without leap seconds the "traditional" time system. We could also
use the name "POSIX", and that is kind of evocative since it reflects
the potential for glitches, but POSIX also defines an epoch date & time
which this time system lacks. I'd be good either way though.
We could answer issue 2. by defining the existing calendars as including
the "traditional" time system and adding more calendars with "_utc"
tacked on to original names, but this glosses over the fact that we
actually can't be sure what we have in existing datasets. We could
instead revise the definition of the "calendar" attribute to include an
optional space-separated modifier that would name the time system used
for the time part of the reference date & time. So instead of adding new
calendars, we would allow values of the calendar attribute to be
calendar = "<calendar> [<time system>]"
where the '<>' indicate placeholders for the things named within them,
and the '[]' indicate optional inclusion. The values for the <time
system> modifer would be
* "unknown" - This is the default if no time system modifier is
specified. Users should be aware that the reference time in the
units attribute may or may not be based on a system (such as UTC)
that includes leap seconds.
* "traditional" - This indicates that the reference time in the units
attribute is based on a traditional "60 seconds in a minute, 60
minutes in an hour, 24 hours in a day, base time zone at the
Greenwich meridian" time system. No leap seconds are included in
this time system. (Like I said, I'd be OK with calling this "posix"
instead.)
* "utc" - This indicates that the reference time in the units
attribute is based on Coordinated Universal Time (UTC), which
includes adjustments and leap seconds after 1958.
So, you could have calendar attribute combinations such as "gregorian
utc" or "gregorian traditional", and "proleptic_gregorian" would
indicate that there is uncertainty (which there is for all existing data
sets) as to which time system was used. You could define the modifier as
only applying to the "real world" calendars to prevent weird
combinations such as "noleap utc". I think that this is a better way to
handle this within the calendar attribute.
That leaves issue 3. The solutions to the first two issues don't address
this issue. This is where the difference between a time system or
calendar and a calculator come up.
We could add a further modifier in the calendar attribute with three
possible values:
* "true_elapsed" - the elapsed time values are certain to be free of
leap second discontinuities
* "false_elapsed" - the elapsed time values will include one or more
leap second discontinuities if any leap second application dates
were included in the time span from the reference time in the units
attribute to any of the time values.
* "unknown" - the elapsed time values might include leap second
discontinuities, but it is not known if they do or not. (the default)
This extends the calendar attribute definition to be
calendar = "<calendar> [<time system>] [<elapsed time type>]"
We could do it this way. I tend to think that the last one might be
better handled as a separate attribute, but I'd be OK with this approach.
Grace and peace,
Jim
Best wishes
Jonathan
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: [email protected] <mailto:[email protected]>
o: +1 828 271 4900
/We will be updating our social media soon. Follow our current Facebook
(NOAA National Climatic Data Center
<https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA
National Oceanographic Data Center <https://www.facebook.com/noaa.nodc>)
and Twitter (@NOAANCDC <https://twitter.com/NOAANCDC> and @NOAAOceanData
<https://twitter.com/NOAAOceanData>) accounts for the latest information./
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata