On Tue, Oct 13, 2015 at 3:21 PM, Nathaniel Smith <n...@pobox.com> wrote:
> > If you are going to make datetime64 more like datetime.datetime, please > consider adding the "fold" bit. See PEP 495. [1] > The challenge here is that we literally do not have a bit too use :-) > hmm -- I was first thinking that this could all be in the timezone stuff (when we get there), but while I imagine we'll want an entire array to be in a single timezone, each individual value would need its own "fold" flag. But in any case, we don't need it 'till we do timezones, and my understanding is that we aren't' going to do timezones until we have the mythical new-and-improved-dtype-system. So a future datetime dtype could be 64 bits + a byte of extra info, or be 63 bits plus the fold flag, or... > Unless we make it datetime65 + 63 bits of padding, stealing a bit to use > for fold would halve the range of representable times, and I'm guessing > this would not be acceptable? > well, not now, with eh fixed epoch, but if the epoch could be adjusted, maybe a small range would be fine -- who need nanosecond accuracy, AND centuries of range? Thinking a bit more here: For those that didn't follow the massive discussion on this on Python-dev and the new datetime list: the fold flag is required to round-trip properly for timezones with discontiguous time -- i.e. Daylight savings. So if you have: 2015-11-01T01:30 Do you mean the first 1:30 am or the seconds one, after the DST transition? (i.e. in the fold, or not?) So it is key, for Python's Datetime, to make sure to keep that information around. However: Python's datetime was designed to be optimized for: - converting between datetime and other representations in Database, etc. - fast math for "naive time" -- i.e. basic manipulations within the same timezone, like "one day later" - Fast math for "absolute relative deltas" is of secondary concern. The result of this is that datetime stores: year, month, day, hour minute second, microsecond It does NOT store some time_unit_since_an_epch, like unix time or numpy datetime64. Also, IIUC, when you associate a datetime with a timezone, it stores the year, month, day, hour, second,... in the specified timezone -- NOT in UTC, or anything else. This makes manipulations within that timezone easy -- the next day simply required adding a day to teh day field (then normalizing to the month). Given all that -- the "fold" bit is needed, as a particular datetime in a particular timezone may have more than one meaning. Note that to compute a proper time span between two "aware" datetimes, it is necessary to convert to UTC, do the math, then convert back to the timezone you want. However, numpy datetime is optimized for compact storage and fast computation of absolute deltas (actual hours, minutes, seconds... not calendar units like "the next day" ). Because of this, and because it's what we already have, datetime64 stores times as "some number of time units since an epoch -- a simple integer. And because we probably want fast absolute delta computation, when we add timezones, we'll probably want to store the datetime in UTC, and apply the timezone on I/O. Alexander: Am I right that we don't need the "fold" bit in this case? You'd still need it when specifying a time in a timezone with folds.. -- but again, only on I/O -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion