All,
I'm a bit confused as to why this is as big a deal at it seems be. I
_think_ I understand the implication of different calendars, leap
seconds, etc, but:
CF encodes time as "some unit of time since an epoch". e.g. "seconds
since 2015-05-08T00:00:00+00:00"
This encoding makes the whole calendar thing a LOT easier than it
could be, because the ONLY place the calendar matters is in the epoch
specification.
So for the most part, it's up to the client code to figure out what
calendar to use, and how to use it (If there is a need to translate
the rest of the time series to "human date-times". In fact, a while
back there was a discussion of allowing ISO8601 strings, rather than
this "time since" stuff for time axis -- darn good we didn't go with that!
So: which calendar the epoch is specified in should be clearly
defined, but:
data creators know what they want -- and if they use a epoch near the
time of concern, it gets much harder for the users to get it wrong.
if users interpret it wrong, it only really matters when you are
comparing the data in this file with data from some other source, and
doing it differently for those two -- pretty unlikely, actually.
leap seconds rarely matter in this context -- you process your data
with or without them, but the worst possible result is the entire data
set being off a couple seconds from another whole data set -- and if
the content creators use a epoch near where the data actually are,
then that wont even be off.
In short -- yes, CF should be clear and precise, but we don't need to
get all worked up about older CF data sets not having a clearly
defined default calendar -- if it really matters (which it very rarely
will), then presumably new datasets will be created with well defined
calendars.
Just my $0.02
-Chris
On Mon, May 11, 2015 at 7:35 AM, Jim Biard <[email protected]
<mailto:[email protected]>> wrote:
Jonathan,
I still think there's
On 5/10/15 3:29 AM, Jonathan Gregory wrote:
Dear all
A postscript to my last message. I am not myself convinced that the backward
incompatibility (for data-writing) that I suggested below is really worth
the
pain it would cause! I think it might well be OK to retain the existing
calendar name of "gregorian" from CF1.7, but give it a more precise
definition,
in order to eliminate the ambiguity about leap seconds. (Of course, there is
nothing we can do to remedy this with existing data.)
With all the arguments otherwise unaltered, that would make my proposal:
* We redefine "gregorian", and introduce a new calendar "gregorian_utc", for
the real-world calendar, without and with UTC leap seconds respectively.
* We abolish the "standard" calendar (since one thing this discussion has
shown is that there is not a single standard!) and we require the calendar
always to be specified (no default).
* We state that all the other calendars have fixed-length days with no leap
seconds.
My further suggestions, about dealing with the Julian/Gregorian transition
and negative years in better ways, are unaffected (except to retain the name
"gregorian"). Those are an optional extra, dealing with a different subject,
but it would be opportune to do both at once.
Cheers
Jonathan
----- Forwarded message from Jonathan Gregory <[email protected]
<mailto:[email protected]>-----
Dear Jim
Yes, I'm glad we appear to be approaching an agreement!
"In order to calculate a new date and time given a base date, base time and
a
time increment one must know what calendar to use."
and I think that is the sense in which I am using "calendar".
I agree that this is a CF-consistent usage of the word calendar, but
it runs against natural usage, and I think it's worth keeping that
in mind.
OK. Perhaps we can clarify what we mean in the CF convention.
I agree that leap seconds haven't been carefully considered before.
I disagree that nearly all existing time values have been encoded
without leap seconds. I'd say that nearly all existing time values
that were derived from true UTC timestamps are at risk of having
leap second discontinuities encoded into the set of values.
All right. In that case I think we may have to take more serious steps to
avoid future problems. I'll come back to that.
There are three issues here, so let's not conflate them. They are:
1. What to call the time system that is like UTC in overall form
(Greenwich meridian, etc) but doesn't include leap seconds.
2. How to indicate which actual time system is being used for the time
part of the reference time in the units attribute.
3. How to indicate whether or not the elapsed times in the time
variable are certain to be free of leap second induced discontinuities.
This is a useful classification, thanks, but I don't think the situation is
quite as complicated as that.
I'm not sure (2) is something we need to be concerned about - but it may
be I've missed a point. In my understanding, there is no calendar implied by
the reference time, because it's given as a timestamp YYYY-MM-DD hh:mm:ss
[+-hh:mm], where the [] is the optional time-zone. A timestamp is calendar-
neutral; it can be interpreted in any calendar, except that some dates will
be
illegal in some calendars, and some times might be illegal if leap seconds
are
in effect. The reference date-time is used for encoding and decoding in the
calendar specified, with or without leap seconds, but itself implies nothing
about the time system. We could state it as a requirement of CF that the
reference date and time must be legal in the encoding which applies to the
time
coordinate.
Regarding part of (1), we need a name for the time-zone which applies at the
Greenwich meridian without summer/daylight-saving time. We shouldn't use
"UTC",
as the CF standard currently does (quoting from udunits(3)), because that's
confusing. Maybe it should just be stated explicitly as I have done!
I agree that it's important to write an explicit definition of the
timestamp that doesn't reference UTC. I don't see timestamps as
being time system neutral any more than datestamps are calendar
neutral. They may be interpretable by more than one system, but
that's not the same thing in my mind as neutrality.
You can read a Gregorian date as a Julian date, but you are going
to be off by a number of days. You can read a UTC time as a
traditional time, but you are going to be off by a number of
seconds. The new calendars you propose would address the question
of what systems (calendar and time) were used for all parts of the
time reference in the units attribute. That is what they
accomplish, they address point (2). They don't really address
point (3) at all.
To address point (3), I continue to favour a subspecies of calendar name,
rather than a modifier or separate attribute, because this distinction only
applies to the real-world calendars. A decomposition of metadata is
cumbersome
if it isn't generally relevant. Suppose X can take values X1 or X2; if it's
X1
then Y can be Y1 or Y2, whereas Y is irrelevant if it's X2. In that
situation I
would have a single attribute with possible values X1_Y1 X1_Y2 and X2. A
single
attribute is easier for scanning a dataset, less work to write and read, and
more likely to be correct because it's less likely that Y will not be coded
if
relevant or will be coded if irrelevant.
I suggest that the leap seconds are needed only in the Gregorian calendar
i.e.
the real-world one. I think it's unlikely the proleptic Gregorian calendar
will
be used with leap-seconds; you only need this calendar if you're going back
more than several centuries, in which case it's probably not a dataset which
has UTC precision in time, and it's most often used with models, which do
not
have leap seconds. (However, a leap-second variety could be introduced if I
am
wrong and it is needed.) Leap seconds are not needed in the noleap,
all_leap,
360_day and none calendars, which are all for model worlds. The julian
calendar
is used astronomically and might be used in models, but the web page you
cited (http://www.ucolick.org/~sla/leapsecs/timescales.html
<http://www.ucolick.org/%7Esla/leapsecs/timescales.html>) points out that's
not a good idea to try to use it with leap seconds since it's based on units
of day (=86400 s).
Point (3) relates to the question of which calculator
(implementation of a calendar and/or time system as software for
converting date and/or time stamps to elapsed times since an
epoch) was used to create the elapsed time values found in a given
time variable. This question is largely independent of the
question of which systems are represented by the reference date
and time in the units attribute. An example in terms of dates
alone may help clarify the issue.
I have a set of datestamps that are based on the Gregorian
calendar. I calculate elapsed days from the reference date stored
in my units attribute on my time variable and populate my time
variable with the values. I use a calculator that is based on the
rules of the Julian calendar to get my elapsed times.
As long as the span of dates from the reference date to the last
datestamp in my set don't cross any year where the two calendars
differ on whether or not to add a leap day, the values stored in
my time variable will be correct. But a 1-day discontinuity will
be encoded into my elapsed day values each time I do cross a place
where the Julian and Gregorian calendars differ.
If I turn my elapsed dates back into datestamps using the same
Julian date calculator, no one will notice. If I instead use a
Gregorian date calculator to recover datestamps and one or more
discontinuities were encoded, I will find that I don't get back
the same set of datestamps I started with. If I try to take
differences between time variable values and one or more
discontinuities were encoded, I will find that the results will
contain an error if the difference is taken across the location of
a discontinuity.
This may sound a bit silly when speaking of Gregorian vs Julian
calendars, but this is exactly what has been happening on the time
system level when people have received UTC timestamps and naively
used the *nix time functions to create elapsed time values to
store into time variables. More explicitly defining the time
system used for the time part of the reference date and time in
the units attribute via an expanded calendar definition does not
tell you how the elapsed times stored in the time variable were
calculated.
If this is the case, then I would propose that in the next version of CF:
* We introduce two new calendars, "gregorian_noleaps" and "gregorian_utc",
for
the real-world calendar, without and with UTC leap seconds respectively. I
suggest "noleaps" instead of "traditional", which you put forward, because
"noleaps" is more self-explanatory, I think. I agree with you that "POSIX"
is
not so good because it implies a reference time.
I agree with Ben Hetland that if we were to go with new calendars
to address point (2) we should use a name like
gregorian_noleapseconds or gregorian_noleapsec. I disagree with
Ben about the question of how difficult it is to parse a fixed
"calender [time-system [encoding]]" sequence.
I think that backward compatibility, among other things, argues in
favor of adding trailing space-separated modifiers to existing
calendar names. Even if we end up going with some version of your
proposed new calendar names, it's important to understand that
those new names and definitions don't solve the issue in point (3).
* We abolish the "gregorian" and "standard" calendars, and we require the
calendar always to be specified (no default). This would be quite a radical
step, but forcing the noleaps/utc property to be explicitly stated is only
way
I can see to avoid in future the ambiguity about whether elapsed times have
been encoded with or without leap seconds. This does not invalidate existing
data that adheres to CF1.6 or earlier, but it would invalidate existing
data-writing software that does not write the calendar attribute, and it
would
require data-reading software to recognise the new calendars (although if it
assumed the existing default for them that would not be too bad).
* We state that all the other calendars have fixed-length days with no leap
seconds.
What do you think? Is it worth the pain?
Although it's going to complicate this discussion, if we decided to abolish
the
existing default, we could take this opportunity to deal with problems with
the
mixed Julian/Gregorian calendar, as
inhttp://cf-trac.llnl.gov/trac/ticket/96,
opened by Dave Allured, but never concluded. To do that I would further
propose
(copying some text from that ticket):
* The gregorian_noleaps and gregorian_utc calendars can be used only to
encode
dates since 1582-10-15, and must not have reference dates earlier than that.
This means they cannot cross the Julian/Gregorian transition. This is
desirable
because there are ambiguities introduced by assuming different dates for the
change in calendar. By this rule, the common choice of 1-1-1 as a reference
date would be disallowed in these calendars, for example.
* We introduce a new calendar "mixed_gregorian_julian", which is the
calendar
of udunits, with no leap seconds. However we make it stricter than
currently,
in these ways: (1) The reference date is not allowed to be any of the dates
in
the transitional period 1582-10-5 to 1582-10-14 inclusive. (2) Neither the
reference date nor any date which is encoded with this calendar is allowed
to
be a negative year. (3) Year 0 is interpreted as climatological time in this
calendar, following COARDS, but this is deprecated in favour of the CF
conventions of Sect 7.4.
* Because of problems caused by the discontinuity, it is recommended that
the
mixed_gregorian_julian calendar be used only in datasets with real-world
historical dates which span the change of calendar from Julian to
Gregorian. In
datasets with real-world historical dates that all precede the change of
calendar, the julian calendar should be used. In datasets with real-world
historical dates that all follow the change of calendar, and in simulated
datasets in which there is no change of calendar, the proleptic_gregorian
calendar should be used.
* We disallow dates to be encoded or reference dates to be used in year
zero or
negative years for the julian calendar, because it's ambiguous whether this
calendar has a year 0.
* We state that year 0 is valid in the proleptic_gregorian, noleap,
all_leap,
360_day and none calendars.
If these further proposals complicate the previous discussion, we can defer
them until we've reached agreement on leap seconds!
Best wishes
Jonathan
----- End forwarded message -----
_______________________________________________
CF-metadata mailing list
[email protected] <mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Grace and peace,
Jim
--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC
<http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information
<http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: [email protected] <mailto:[email protected]>
o: +1 828 271 4900 <tel:%2B1%20828%20271%204900>
/We will be updating our social media soon. Follow our current
Facebook (NOAA National Climatic Data Center
<https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA
National Oceanographic Data Center
<https://www.facebook.com/noaa.nodc>) and Twitter (@NOAANCDC
<https://twitter.com/NOAANCDC> and @NOAAOceanData
<https://twitter.com/NOAAOceanData>) accounts for the latest
information./
_______________________________________________
CF-metadata mailing list
[email protected] <mailto:[email protected]>
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
[email protected] <mailto:[email protected]>
_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata