Re: [cf-convention/cf-conventions] Add calendars gregorian_tai and gregorian_utc (#148)

Chris Barker Wed, 31 Oct 2018 12:09:42 -0700

@JimBiardCics wrote:
> For the moment, let's set aside the question of names for the calendars.


OK, though a bit hard to talk about :-)

> In a perfect world, all data producers would use a leap-second-aware function 
> to turn their lists of time stamps into elapsed time values and all time 
> variables would be "perfect". 

almost -- I think TIA time is also a perfectly valid system for a "perfect" 
world :-)

But yeah, most datetime libraries do not handle leap seconds, which, kida 
ironically, means that folks are using TIA time even if they think they are 
using UTC :-)

> Does naive conversion of UTC time stamps into elapsed times have the 
> potential to produce non-monotonic time coordinate variables that violate the 
> CF conventions? Yes. Does it cause any real problems (for the vast majority 
> of cases and instances of time) if people use this "broken" method for 
> encoding and decoding their time stamps? No.

I'm not so sure -- I think having a time axis that is "non-metric" as you call 
it can be a real problem. Yes, it could potentially be turned into a series of 
correct UTC timestamps by reversing the same incorrect math used to produce it, 
but many use cases are working the the time access in time units (seconds, 
hours, etc), and need it to be nifty things like monotonic and differentiable, 
etc.

> we are trying to make a way for data producers to signal to data users how 
> they should handle the values in their time variables while staying within 
> the existing CF time framework and acknowledging CF and world history 
> regarding the way we deal with time

Fair enough -- a worthy goal.

>There is nothing at all wrong with specifying that the epoch time stamp in the 
>units attribute always be a correct UTC time stamp. In fact, allowing the 
>epoch time stamp to be from a TAI or UTC clock will increase the chances that 
>the data will be handled incorrectly. If you are sophisticated enough to care 
>about TAI, you will have no problem dealing with a UTC time stamp.

I disagree here -- the truth is that TAI is EASIER to deal with -- most 
datetime libraries handle it just fine in fact, it is all they handle 
correctly. So I think a calendar that is explicitly TAI is a good idea.

I think we are converging on a few decisions:

1) Due to legacy, uninformed users, poor library support, and the fact that it 
just doesn't matter to most use cases, we will have an "ambiguous with regard 
to leap seconds" calendar in CF. Probably called "gregorian", because that's 
what we already have, and explicit or not, that's what is means with existing 
datasets. So we need some better docs here.

2) Do we need an explicit "UTC" calendar, in which leap seconds ARE taken into 
account. The file would only be correct if the timestamp is "proper" UTC, and 
you would get the right (UTC) timestamps back if and only if you used a 
leap-second-accounting for time library. The values themselves would be 
"metric" (by Jim's definition)

3) Do we need an explicit "TAI" calendar. The file would only be correct if the 
timestamp is "proper" TAI, and you would get the right (TAI) timestamps back if 
and only if you did not apply leap seconds.  The values themselves would be 
"metric" (by Jim's definition).

Note that the only actual difference between (2) and (3) is that the timestamp 
is in UTC or TAI, which are different since some time in 1958, but up to 37 
seconds. In either case, the values themselves would be "proper", and you could 
compute differences, etc easily and correctly.

4) minor point -- do we disallow "days" in any of these, or be happy with 1day 
== 24 hours == 86400 seconds. I'm fine with days defined this way -- it is 
almost always correct, and always what people expect. (though it could cause 
issues, maybe, with some datetime libs, but only those that account for 
leap-seconds, so I doubt it)

5) I think this is the contentious one: Do we have a calendar (encoding, 
really) that is:

Elapsed time since a UTC timestamp, but with elapsed time computed from a 
correct-with-regard-to-leapseconds UTC time stamp with a library that does not 
account for leap seconds. This would mean that the values themselves may not be 
"metric". 

I think this is what Jim is proposing.

(by the way, times into the future (when leap-seconds are an unknown) as of the 
creation of the file should be disallowed)

Arguments for (Jim, you can add here :-) )

* people are already creating time variables like this -- it would be nice ot 
be able to explicitly define that that's what you've done, so folks can 
interpret them exactly correctly.

* since a lot of instruments., computers, etc. use UTC time with leap seconds 
applied, and most tiem processing libraries don't support leap seconds -- folks 
will continue to produce such data, and, in fact have little choice but to do 
so.

Arguments against:

* This is technically incorrect data -- it says "seconds since", but it isn't 
actually always seconds since. We should not allow incorrect data as CF 
compliant. Bad libraries are not CF's responsibility.

* A time axis created this way will be non-"metric" - that is, you can't 
compute elapsed time correctly directly from the values -- this is likely to 
lead to confusion, but worse still, hard to detect hidden bugs -- that is, code 
that works on almost any dataset might suddenly fail if a value happens to fall 
near a leap-second, and you get a zero-length "second" (or even a negative one? 
-- is that possible).

* (same as above, really) -- a time variable of this sort can only be used 
correctly if it is first converted to UTC timestamps.

* There may be issues with processing this data with some (most?) time 
libraries (in particular the ones that don't account for leap-seconds). This is 
because if you convert to a UTC timestamp with leap-seconds, you can get a 
minute with 60 seconds in it, for example:

December 31, 2016 at 23:59:60 UTC

And some time libraries do not allow that.

Example python's datetime:
```
In [3]: from datetime import datetime

In [4]: datetime(2016, 12, 31, 23, 59, 60)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-a8e1ba1d62e5> in <module>()
----> 1 datetime(2016, 12, 31, 23, 59, 60)

ValueError: second must be in 0..59
```

Given these trade-offs, I think CF should not support this -- but if others 
feel differently, fine -- but do not call it "UTC" or "TAI"! -- and document it 
carefully!.

That last point is key -- this entire use-case is predicated on the idea that 
folks are working with full-on-proper-leap-second-aware UTC timestamps, but 
processing them with a non-leap-second-aware library -- and that this is a 
fully definable and reversible process. But at least with one commonly used 
datetime library (Python's built-in datetime). It simply will not work for 
every single case -- it will work for almost every case, so someone could 
process this data for years and never notice, but it's not actually correct! In 
fact, I suspect most computer systems can't handle: December 31, 2016 at 
23:59:60 UTC, and will never give you that value -- rather, (IIUC) they 
accommodate leap seconds by resetting the internal clock so that "seconds since 
the epoch" gives the correct UTC time when computed without leap seconds. But 
that reset happens at best one second too late (so that you won't get that 
invalid timestamp).

All this leads me to believe that if anyone really cares about sub-second-level 
precision over a period of years, then they really, really should be using TAI, 
and if they THINK they are getting one-seconds precision, they probably aren't, 
or have hidden bugs waiting to happen. I don't think we should support that in 
CF.

Final point:

> When you read time out of a GPS unit, you can get a count of seconds since 
> the GPS epoch, and I believe you can get a GPS time stamp that doesn't 
> contain leap seconds (like TAI time stamps, but with a fixed offset from 
> TAI), but most people get a UTC time stamp. The GPS messages carry the 
> current leap second count and receivers apply it by default when generating 
> time stamps. 

OK -- but I suspect that yes, most people get a UTC timestamp, and most people 
don't understand the difference, and most people don't care about second-level 
accuracy over years.

The over years part is because if you have, say, a GPS track you are trying to 
encode in CF, you should use a reference timestamp that is close to your data 
-- maybe the start of the day you took the track. So unless you happen to be 
collecting data when a leap second occurs, there will be no problem.

For those few people that really do care about utmost precision -- they should 
use the TAI timestamp from their GPS -- and if it's a consumer-grade GPS that 
doesn't provide that -- they should get a new GPS! It's probably easier to 
figure out how to get TAI time from a GPS than it is to find a 
leap-second-aware time library :-)

Side note: Is anyone aware of a proper leap-second aware time library??

Sorry for the really long note -- but I do think we are converging here, and I 
won't put up a stink if folks want to add the sort-of-utc calendar -- as long 
as it's well named and well documented.

-CHB












-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/148#issuecomment-434805310

Re: [cf-convention/cf-conventions] Add calendars gregorian_tai and gregorian_utc (#148)

Reply via email to