[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 183, Issue 33

Lev Maximov Fri, 31 Dec 2021 04:29:35 -0800

Hey, Stefano!

The level of being pedantic is absolutely acceptable.


I don't question any of your arguments. They are all perfectly valid.

Except that I'd rather say it is ~29 seconds if measuring against 1970.
Leap seconds were introduced in 1972 and there were
a total of 27 seconds since then, but TAI time was ticking since 1958 and
gained 10 seconds by 1970 so it is approximately 0.83 second per year at
which gives approx 28.67 sec between today and 1970.
So 1970 is a bad choice of epoch if you want to introduce a
leap-second-aware datetime.
In GPS time they chose 1980. In TAI it is 1958, but that is somewhat worse
than 1980 because it is not immediately clear how to perform the conversion
timestamp<->timedelta between 1958 and 1970.

Something like 'proleptic gps time' would be needed to estimate the number
of leap seconds in the years before 1972 when they were introduced. Or
maybe to limit the leap-second timescale
to start at 1972 and not to accept any timestamps before that date.

The system that ignores the existence of the leap seconds has a right to
exist.
But it just has limited applicability.

np.datetime64 keeps time as a delta between the moment in time and a
predefined epoch.
Which standard does it use to translate this delta to human-readable time
in years,
months, and so on?

If it is UTC, then it must handle times like 2016-12-31 23:59:60, because
it is a valid UTC timestamp.
>>> np.datetime64('2016-12-31 12:59:60')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Seconds out of range in datetime string "2016-12-31 12:59:60"

Datetime also fails (so far) to handle it:
>>> dt(2016,12,31,23,59,60)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: second must be in 0..59

But `time` works. Well, at least it doesn't raise an exception:
>>> t = time.struct_time((2016,12,31,12,59,60,0,0,0)); t
time.struct_time(tm_year=2016, tm_mon=12, tm_mday=31, tm_hour=12,
tm_min=59, tm_sec=60, tm_wday=0, tm_yday=0, tm_isdst=0)
>>> time.asctime(t)
'Mon Dec 31 12:59:60 2016'
>>> time.gmtime(calendar.timegm(t))
time.struct_time(tm_year=2017, tm_mon=1, tm_mday=1, tm_hour=1, tm_min=0,
tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)

Imagine a user that decides which library to use to store some (life
critical!) measurements taken every 100 ms. He looks at NumPy datetime64,
reads that it is capable of handling attosecods, and decides that it is a
perfect fit. Now imagine that on 31 Dec 2022 the World Government decided
to inject a leap second. The system will receive the announcement from the
NTC servers and will
prepare to replay this second twice. As soon as this moment chimes in he'll
run into a ValueError, which he won't notice because he's celebrating a New
Year :) And guess whom he'll blame? ;)

Actually the humanity has already got used to replaying timespans twice. It
happens every year in the countries that observe daylight saving time. And
the solution is to use a more linear scale than local time, namely, UTC.
But now turns out that UTC is not linear enough and it also has certain
timespans happening twice.

The solution once again is use a _really_ linear time which is TAI. I think
python 'time' library did a right thing to introduce time.CLOCK_TAI, after
all.

Astropy handles the UTC scale properly though:
>>> t = Time('2016-12-31 23:59:60')
<Time object: scale='utc' format='iso' value=2016-12-31 23:59:60.000>
>>> t0 = Time('2016-12-31 23:59:59')
<Time object: scale='utc' format='iso' value=2016-12-31 23:59:59.000>
>>> delta = t-t0
<TimeDelta object: scale='tai' format='jd' value=1.1574074074038876e-05>
>>> delta.sec
0.9999999999969589
>>> t0 + delta
<Time object: scale='utc' format='iso' value=2016-12-31 23:59:60.000>

So the solution for that particular person with regular intervals of time
is to use astropy. I mention it in the article.
I made some corrections to the text. I'd be grateful if you had a look and
pointed me to the particular sentences
that need improvement.

Best regards,
Lev


On Wed, Dec 29, 2021 at 6:54 PM Stefano Miccoli <stefano.micc...@polimi.it>
wrote:

> Lev, excuse me if I go in super pedantic mode, but your answer and the
> current text of the article fail to grasp an important point.
>
> 1) The proleptic Gregorian calendar is about leap *year* rules. It tracks
> days without making any assumption on the length of days. If we agree on
> using this calendar, dates like -0099-07-12 and 2021-12-29 are defined
> without ambiguity, and we can easily compute the number of days between
> these two dates.
>
> 2) Posix semantics is about the length of a day, and is based on the
> (utterly wrong) assumption that a mean solar day is constant and exactly
> 86400 SI seconds long. (For an authoritative estimate of historical length
> of day variations see <http://astro.ukho.gov.uk/nao/lvm/> and the related
> papers <http://doi.org/10.1098/rspa.2016.0404 <
> https://doi.org/10.1098/rspa.2020.0776>)
>
> Knowing assumption 1) is important when coding dates before 1582-10-15:
> e.g. 1582-10-04 Julian is 1582-10-14 proleptic Gregorian. Once we agree on
> the proleptic Gregorian calendar everything works as expected: time deltas
> expressed in days are correct.
>
> Knowing assumption 2) is important if we pretend to compute time deltas
> for date-time objects with high precision: e.g. how many SI seconds occur
> between 1582-10-14T12:00:00 and 1582-10-15T12:00:00 with millisecond
> precision? Here we must first define what T12:00:00 means, say UT1, but
> most critically we need to know the length of day in 1582. With Posix
> semantics a day is always 86400.000 SI second long; however  the real value
> of the length of day in 1582 could be about 5 ms less. The problem here is
> that small errors accumulate and if we compute the difference between
> 0000-01-01T12:00:00 and 1900-01-01T12:00:00 the numpy answer may be off by
> about 10_000 seconds.
>
> Fast forward to current times: after 1972 T12:00:00 should be defined as
> UTC, and the posix assumption is correct for almost every day, bar when a
> leap second is added (86401 s) or removed (86399 s, but this has never
> occurred.) Now the numpy computed timedeltas are correct up to an integral
> number of seconds that can be derived from a leap second table, if both
> dates are in the past. If one or both of the dates are in the future, then
> we must rely on models of earth rotation, and estimate the future
> introduction of leap seconds. But earth rotation is quite “unpredictable”,
> so usually this is not very accurate.
>
> The main problem with numpy datetime64 is that by using np.int64 for
> Datetimes it gives 1/2**63 precision (about 1e-19). But this apparent very
> high precision has to be confronted with the relative accuracy of the Posix
> semantics, which lies at about 1e-7, 1e-8, if we look at timespans of a
> couple of centuries. So I agree that the np.datetime64 precision is somehow
> misleading.
>
> This all said, proleptic Gregorian + Posix semantics is, in my opinion,
> the only sensible option in a numerical package like numpy, although the
> results can be inaccurate. However errors are usually small on the average
> (say 10 ms/day which is about 1e-7). Everything more sophisticated is in
> the realm of specialised packages, like AstroPy, but also Skyfield <
> https://rhodesmill.org/skyfield/>.
>
> Stefano
>
> On 28 Dec 2021, at 21:35, numpy-discussion-requ...@python.org wrote:
>
> t is not a matter of formal definitions. Leap seconds are
> uncompromisingly practical.
> If you look at the wall clock on 1 Jan 1970 00:00 and then look at the
> same clock today and measure the difference with atomic clock you won't get
> the time delta that np.timedelta64 reports. There will be a difference of
> ~37 seconds.
>
>
> Actually this should be 27s.
>
> One would expect that a library claiming to work with attoseconds would at
> least count the seconds correctly )
>
> Astropy library calculates
> <https://het.as.utexas.edu/HET/Software/Astropy-1.0/api/astropy.time.TimeGPS.html>
>  them properly:
> "GPS Time. Seconds from 1980-01-06 00:00:00 UTC For example, 630720013.0
> is midnight on January 1, 2000."
> >>> np.datetime64('2000-01-01', 's') - np.datetime64('1980-01-06', 's')
> numpy.timedelta64(630720000,'s')
>
> Everything should be made as simple as possible but not simpler. Leap
> seconds are an inherent part of the world we live in.
>
> Eg this is how people deal with them currently: they have to parse times
> like 23:59:60.209215 manually
>
> https://stackoverflow.com/questions/21027639/python-datetime-not-accounting-for-leap-second-properly
>
> - calendrical calculations are performed using a proleptic Gregorian
>> calendar <https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar>,
>> - Posix semantics is followed, i.e. each day comprises exactly 86400 SI
>> seconds, thus ignoring the existence of leap seconds.
>>
>> I would also point out that this choice is consistent with python
>> datetime.
>>
> But not consistent with python time ;) "Unlike the time module, the
> datetime module does not support leap seconds."
> • time.CLOCK_TAI
>     International Atomic Time
>    The system must have a current leap second table in order for this to
> give the correct answer. PTP or NTP software can maintain a leap second
> table.
>     Availability: Linux.
>     New in version 3.9.
>
>
>> As what regards the promised future support for leap seconds, I would not
>> mention it, for now. In fact leap second support requires a leap second
>> table, which is not available on all platforms supported by numpy.
>> Therefore the leap second table should be bundled and updated with every
>> numpy release with the very undesirable effect that older version (with
>> outdated tables) would behave differently from newer ones.
>>
> The olson database is much larger yet it is updated on millions of
> computers, phones and what not without causing extra difficulties
> (except when the government unexpectedly decides to shift a region from
> one TZ to another). This way developers have a choice whether
> to work with naive datetimes (ok in a single timezone without
> daylight-saving) or with timezone-aware ones (and take care about updating
> the pytz).
>
> This is how astropy deals with updating the table:
> https://docs.astropy.org/en/stable/api/astropy.utils.iers.LeapSeconds.html
> Pytz also has this table both inside the binary tz files and in a text
> file: https://github.com/stub42/pytz/blob/master/tz/leap-seconds.list
> which it in turn downloads from NIST
> ftp://ftp.nist.gov/pub/time/leap-seconds.list
> It is in the public domain, NIST updates this file regularly and it even
> has an expiration date (presently it is 28 June 2022).
> Activation of the 'leap-second-aware mode' could be made dependent on the
> presence of the pytz mode and/or this expiration date.
>
> I don't think having a non-default leap-second-aware mode would hurt
> anyone, but I also wouldn't consider it a priority. I think when someone
> needs them he'll make a patch and until that moment it is safe to have them
> as 'proposed' )
>
> I feel that leap seconds should be mentioned somewhere—in the article or
> in the docs, because it limits practical precise usage of timedelta64 to
> a period between 2021 and 2016 (last time when a leap second was
> injected). A modest timespan for a library claiming to work with years
> upto 9.2e18 BC ;)
>
> Thank you for your suggestions! I've included them into the article, plz
> have a look at the updated version.
>
> Best regards,
> Lev
>
>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: lev.maxi...@gmail.com
>

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 183, Issue 33

Reply via email to