Chris,

I agree that in many cases the implications of mistakes in which calendar and/or time system was used when creating or using a given file are often negligible. This is one of those situations where it most often doesn't really matter, but it sometimes can matter a lot. If you are working with 1 km resolution data from a satellite that is moving at ~7 km/sec, then even a 1 second discrepancy is significant. If the time span over your full set of files is long enough that a leap second event is included, then even time differences taken between points within your set of files can be wrong.

Because this is something that is often insignificant, I proposed a combination clarified definitions, and additional attributes and/or optional modifiers to calendar names that allow for greater precision when it matters without sacrificing backward compatibility for large numbers of datasets.

Grace and peace,

Jim

On 5/11/15 2:18 PM, Chris Barker wrote:
All,

I'm a bit confused as to why this is as big a deal at it seems be. I _think_ I understand the implication of different calendars, leap seconds, etc, but:

CF encodes time as "some unit of time since an epoch". e.g. "seconds since 2015-05-08T00:00:00+00:00"

This encoding makes the whole calendar thing a LOT easier than it could be, because the ONLY place the calendar matters is in the epoch specification.

So for the most part, it's up to the client code to figure out what calendar to use, and how to use it (If there is a need to translate the rest of the time series to "human date-times". In fact, a while back there was a discussion of allowing ISO8601 strings, rather than this "time since" stuff for time axis -- darn good we didn't go with that!

So: which calendar the epoch is specified in should be clearly defined, but:

data creators know what they want -- and if they use a epoch near the time of concern, it gets much harder for the users to get it wrong.

if users interpret it wrong, it only really matters when you are comparing the data in this file with data from some other source, and doing it differently for those two -- pretty unlikely, actually.

leap seconds rarely matter in this context -- you process your data with or without them, but the worst possible result is the entire data set being off a couple seconds from another whole data set -- and if the content creators use a epoch near where the data actually are, then that wont even be off.

In short -- yes, CF should be clear and precise, but we don't need to get all worked up about older CF data sets not having a clearly defined default calendar -- if it really matters (which it very rarely will), then presumably new datasets will be created with well defined calendars.

Just my $0.02


-Chris



On Mon, May 11, 2015 at 7:35 AM, Jim Biard <[email protected] <mailto:[email protected]>> wrote:

    Jonathan,

    I still think there's


    On 5/10/15 3:29 AM, Jonathan Gregory wrote:
    Dear all

    A postscript to my last message. I am not myself convinced that the backward
    incompatibility (for data-writing) that I suggested below is really worth 
the
    pain it would cause! I think it might well be OK to retain the existing
    calendar name of "gregorian" from CF1.7, but give it a more precise 
definition,
    in order to eliminate the ambiguity about leap seconds. (Of course, there is
    nothing we can do to remedy this with existing data.)

    With all the arguments otherwise unaltered, that would make my proposal:

    * We redefine "gregorian", and introduce a new calendar "gregorian_utc", for
    the real-world calendar, without and with UTC leap seconds respectively.

    * We abolish the "standard" calendar (since one thing this discussion has
    shown is that there is not a single standard!) and we require the calendar
    always to be specified (no default).

    * We state that all the other calendars have fixed-length days with no leap
    seconds.

    My further suggestions, about dealing with the Julian/Gregorian transition
    and negative years in better ways, are unaffected (except to retain the name
    "gregorian"). Those are an optional extra, dealing with a different subject,
    but it would be opportune to do both at once.

    Cheers

    Jonathan

    ----- Forwarded message from Jonathan Gregory <[email protected]  
<mailto:[email protected]>-----

    Dear Jim

    Yes, I'm glad we appear to be approaching an agreement!

    "In order to calculate a new date and time given a base date, base time and 
a
    time increment one must know what calendar to use."
    and I think that is the sense in which I am using "calendar".
    I agree that this is a CF-consistent usage of the word calendar, but
    it runs against natural usage, and I think it's worth keeping that
    in mind.
    OK. Perhaps we can clarify what we mean in the CF convention.

    I agree that leap seconds haven't been carefully considered before.
    I disagree that nearly all existing time values have been encoded
    without leap seconds. I'd say that nearly all existing time values
    that were derived from true UTC timestamps are at risk of having
    leap second discontinuities encoded into the set of values.
    All right. In that case I think we may have to take more serious steps to
    avoid future problems. I'll come back to that.

    There are three issues here, so let's not conflate them. They are:

    1. What to call the time system that is like UTC in overall form
        (Greenwich meridian, etc) but doesn't include leap seconds.
    2. How to indicate which actual time system is being used for the time
        part of the reference time in the units attribute.
    3. How to indicate whether or not the elapsed times in the time
        variable are certain to be free of leap second induced discontinuities.
    This is a useful classification, thanks, but I don't think the situation is
    quite as complicated as that.

    I'm not sure (2) is something we need to be concerned about - but it may
    be I've missed a point. In my understanding, there is no calendar implied by
    the reference time, because it's given as a timestamp YYYY-MM-DD hh:mm:ss
    [+-hh:mm], where the [] is the optional time-zone. A timestamp is calendar-
    neutral; it can be interpreted in any calendar, except that some dates will 
be
    illegal in some calendars, and some times might be illegal if leap seconds 
are
    in effect. The reference date-time is used for encoding and decoding in the
    calendar specified, with or without leap seconds, but itself implies nothing
    about the time system. We could state it as a requirement of CF that the
    reference date and time must be legal in the encoding which applies to the 
time
    coordinate.

    Regarding part of (1), we need a name for the time-zone which applies at the
    Greenwich meridian without summer/daylight-saving time. We shouldn't use 
"UTC",
    as the CF standard currently does (quoting from udunits(3)), because that's
    confusing. Maybe it should just be stated explicitly as I have done!
    I agree that it's important to write an explicit definition of the
    timestamp that doesn't reference UTC. I don't see timestamps as
    being time system neutral any more than datestamps are calendar
    neutral. They may be interpretable by more than one system, but
    that's not the same thing in my mind as neutrality.

    You can read a Gregorian date as a Julian date, but you are going
    to be off by a number of days. You can read a UTC time as a
    traditional time, but you are going to be off by a number of
    seconds. The new calendars you propose would address the question
    of what systems (calendar and time) were used for all parts of the
    time reference in the units attribute. That is what they
    accomplish, they address point (2). They don't really address
    point (3) at all.
    To address point (3), I continue to favour a subspecies of calendar name,
    rather than a modifier or separate attribute, because this distinction only
    applies to the real-world calendars. A decomposition of metadata is 
cumbersome
    if it isn't generally relevant. Suppose X can take values X1 or X2; if it's 
X1
    then Y can be Y1 or Y2, whereas Y is irrelevant if it's X2. In that 
situation I
    would have a single attribute with possible values X1_Y1 X1_Y2 and X2. A 
single
    attribute is easier for scanning a dataset, less work to write and read, and
    more likely to be correct because it's less likely that Y will not be coded 
if
    relevant or will be coded if irrelevant.

    I suggest that the leap seconds are needed only in the Gregorian calendar 
i.e.
    the real-world one. I think it's unlikely the proleptic Gregorian calendar 
will
    be used with leap-seconds; you only need this calendar if you're going back
    more than several centuries, in which case it's probably not a dataset which
    has UTC precision in time, and it's most often used with models, which do 
not
    have leap seconds. (However, a leap-second variety could be introduced if I 
am
    wrong and it is needed.) Leap seconds are not needed in the noleap, 
all_leap,
    360_day and none calendars, which are all for model worlds. The julian 
calendar
    is used astronomically and might be used in models, but the web page you
    cited (http://www.ucolick.org/~sla/leapsecs/timescales.html  
<http://www.ucolick.org/%7Esla/leapsecs/timescales.html>) points out that's
    not a good idea to try to use it with leap seconds since it's based on units
    of day (=86400 s).
    Point (3) relates to the question of which calculator
    (implementation of a calendar and/or time system as software for
    converting date and/or time stamps to elapsed times since an
    epoch) was used to create the elapsed time values found in a given
    time variable. This question is largely independent of the
    question of which systems are represented by the reference date
    and time in the units attribute. An example in terms of dates
    alone may help clarify the issue.

    I have a set of datestamps that are based on the Gregorian
    calendar. I calculate elapsed days from the reference date stored
    in my units attribute on my time variable and populate my time
    variable with the values. I use a calculator that is based on the
    rules of the Julian calendar to get my elapsed times.

    As long as the span of dates from the reference date to the last
    datestamp in my set don't cross any year where the two calendars
    differ on whether or not to add a leap day, the values stored in
    my time variable will be correct. But a 1-day discontinuity will
    be encoded into my elapsed day values each time I do cross a place
    where the Julian and Gregorian calendars differ.

    If I turn my elapsed dates back into datestamps using the same
    Julian date calculator, no one will notice. If I instead use a
    Gregorian date calculator to recover datestamps and one or more
    discontinuities were encoded, I will find that I don't get back
    the same set of datestamps I started with. If I try to take
    differences between time variable values and one or more
    discontinuities were encoded, I will find that the results will
    contain an error if the difference is taken across the location of
    a discontinuity.

    This may sound a bit silly when speaking of Gregorian vs Julian
    calendars, but this is exactly what has been happening on the time
    system level when people have received UTC timestamps and naively
    used the *nix time functions to create elapsed time values to
    store into time variables. More explicitly defining the time
    system used for the time part of the reference date and time in
    the units attribute via an expanded calendar definition does not
    tell you how the elapsed times stored in the time variable were
    calculated.
    If this is the case, then I would propose that in the next version of CF:

    * We introduce two new calendars, "gregorian_noleaps" and "gregorian_utc", 
for
    the real-world calendar, without and with UTC leap seconds respectively. I
    suggest "noleaps" instead of "traditional", which you put forward, because
    "noleaps" is more self-explanatory, I think. I agree with you that "POSIX" 
is
    not so good because it implies a reference time.
    I agree with Ben Hetland that if we were to go with new calendars
    to address point (2) we should use a name like
    gregorian_noleapseconds or gregorian_noleapsec. I disagree with
    Ben about the question of how difficult it is to parse a fixed
    "calender [time-system [encoding]]" sequence.

    I think that backward compatibility, among other things, argues in
    favor of adding trailing space-separated modifiers to existing
    calendar names. Even if we end up going with some version of your
    proposed new calendar names, it's important to understand that
    those new names and definitions don't solve the issue in point (3).

    * We abolish the "gregorian" and "standard" calendars, and we require the
    calendar always to be specified (no default). This would be quite a radical
    step, but forcing the noleaps/utc property to be explicitly stated is only 
way
    I can see to avoid in future the ambiguity about whether elapsed times have
    been encoded with or without leap seconds. This does not invalidate existing
    data that adheres to CF1.6 or earlier, but it would invalidate existing
    data-writing software that does not write the calendar attribute, and it 
would
    require data-reading software to recognise the new calendars (although if it
    assumed the existing default for them that would not be too bad).

    * We state that all the other calendars have fixed-length days with no leap
    seconds.

    What do you think? Is it worth the pain?

    Although it's going to complicate this discussion, if we decided to abolish 
the
    existing default, we could take this opportunity to deal with problems with 
the
    mixed Julian/Gregorian calendar, as 
inhttp://cf-trac.llnl.gov/trac/ticket/96,
    opened by Dave Allured, but never concluded. To do that I would further 
propose
    (copying some text from that ticket):

    * The gregorian_noleaps and gregorian_utc calendars can be used only to 
encode
    dates since 1582-10-15, and must not have reference dates earlier than that.
    This means they cannot cross the Julian/Gregorian transition. This is 
desirable
    because there are ambiguities introduced by assuming different dates for the
    change in calendar. By this rule, the common choice of 1-1-1 as a reference
    date would be disallowed in these calendars, for example.

    * We introduce a new calendar "mixed_gregorian_julian", which is the 
calendar
    of udunits, with no leap seconds. However we make it stricter than 
currently,
    in these ways: (1) The reference date is not allowed to be any of the dates 
in
    the transitional period 1582-10-5 to 1582-10-14 inclusive. (2) Neither the
    reference date nor any date which is encoded with this calendar is allowed 
to
    be a negative year. (3) Year 0 is interpreted as climatological time in this
    calendar, following COARDS, but this is deprecated in favour of the CF
    conventions of Sect 7.4.

    * Because of problems caused by the discontinuity, it is recommended that 
the
    mixed_gregorian_julian calendar be used only in datasets with real-world
    historical dates which span the change of calendar from Julian to 
Gregorian. In
    datasets with real-world historical dates that all precede the change of
    calendar, the julian calendar should be used. In datasets with real-world
    historical dates that all follow the change of calendar, and in simulated
    datasets in which there is no change of calendar, the proleptic_gregorian
    calendar should be used.

    * We disallow dates to be encoded or reference dates to be used in year 
zero or
    negative years for the julian calendar, because it's ambiguous whether this
    calendar has a year 0.

    * We state that year 0 is valid in the proleptic_gregorian, noleap, 
all_leap,
    360_day and none calendars.

    If these further proposals complicate the previous discussion, we can defer
    them until we've reached agreement on leap seconds!

    Best wishes

    Jonathan

    ----- End forwarded message -----
    _______________________________________________
    CF-metadata mailing list
    [email protected]  <mailto:[email protected]>
    http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
    Grace and peace,

    Jim
-- CICS-NC <http://www.cicsnc.org/> Visit us on
    Facebook <http://www.facebook.com/cicsnc>     *Jim Biard*
    *Research Scholar*
    Cooperative Institute for Climate and Satellites NC
    <http://cicsnc.org/>
    North Carolina State University <http://ncsu.edu/>
    NOAA National Centers for Environmental Information
    <http://ncdc.noaa.gov/>
    /formerly NOAA’s National Climatic Data Center/
    151 Patton Ave, Asheville, NC 28801
    e: [email protected] <mailto:[email protected]>
    o: +1 828 271 4900 <tel:%2B1%20828%20271%204900>

    /We will be updating our social media soon. Follow our current
    Facebook (NOAA National Climatic Data Center
    <https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA
    National Oceanographic Data Center
    <https://www.facebook.com/noaa.nodc>) and Twitter (@NOAANCDC
    <https://twitter.com/NOAANCDC> and @NOAAOceanData
    <https://twitter.com/NOAAOceanData>) accounts for the latest
    information./



    _______________________________________________
    CF-metadata mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected] <mailto:[email protected]>


_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc>         *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: [email protected] <mailto:[email protected]>
o: +1 828 271 4900

/We will be updating our social media soon. Follow our current Facebook (NOAA National Climatic Data Center <https://www.facebook.com/NOAANationalClimaticDataCenter> and NOAA National Oceanographic Data Center <https://www.facebook.com/noaa.nodc>) and Twitter (@NOAANCDC <https://twitter.com/NOAANCDC> and @NOAAOceanData <https://twitter.com/NOAAOceanData>) accounts for the latest information./


_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to