A Friday 11 July 2008, Christopher Barker escrigué: > Francesc Alted wrote: > > We are planning to implement some date/time types for NumPy, > > +1 > > A couple questions/comments: > > ``datetime64`` > > - Expressed in microseconds since POSIX epoch (January 1, 1970). > > > > - Resolution: nanoseconds. > > how is that possible? Is that a typo?
Exactly. This should read *microseconds*. I've sent the corrected version before. > > > This will be compatible with the Python ``datetime`` module > > very important! > > > Observations:: > > > > This will be not be fully compatible with the Python > > ``datetime`` module neither in terms of precision nor time-span. > > However, getters and setters will be provided for it (loosing > > precision or overflowing as needed). > > How to you propose handling overflowing? Would it raise an exception? Yes. We propose to use exactly the same exception handling than NumPy (so it will be configurable by the user). > > Another option would be to have a version that stored the datetime in > two values: say two int64s or something (kind of like complex numbers > are handled). This would allow a long time span and nanosecond (or > finer) precision. I guess it would require a bunch of math code to be > written, however. I suppose so, yes. Besides, this certainly violates the requeriment of having a fast implementation (unless we want to use a lot of time optimizing such a 'complex' date/time type). There is also the problem of requiring more space. See later. > > > * ``timefloat64`` > > - Resolution: 1 microsecond (for +-32 years from epoch) or 14 > > digits (for distant years from epoch). So the precision is > > *variable*. > > I'm not sure this is that useful, exactly for that reason. What's the > motivation for it? I can see using a float for timedelta -- as, in > general, you'll need less precision the linger your time span, but > having precision depend on how far you happen to be from the epoch > seems risky (though for anything I do, it wouldn't matter in the > least). Well, as I said before, we wanted this mainly for geological/astronomical uses, but as this type has the property of having microsecond resolution during the years [1902 - 2038], it would be definitely useful for many other cases too. I can say that Postgres, as for one, implements a datetime type based on a float64 by default (although you can choose an int64 in compilation time) with exactly the same properties than ``timefloat64``. So, if Postgres is doing this, it should be definitely useful in many use cases. > > > Example of use > > > > In [11]: t[0] = datetime.datetime.now() # setter in action > > > > In [12]: t[0] > > Out[12]: 733234384724 # representation as an int64 (scalar) > > hmm - could it return a numpy.datetime object instead, rather than a > straight int64? I'd like to see a representation that is clearly > datetime. Could be. But we should not forget that we are implementing the type for an array package, and the output can become cumbersome very soon. What I wanted to avoid here was having this: [datetime(2008, 7, 11, 19, 16, 10, 996509), datetime(2008, 7, 11, 19, 16, 10, 996535), datetime(2008, 7, 11, 19, 16, 10, 996547), datetime(2008, 7, 11, 19, 16, 10, 996559), datetime(2008, 7, 11, 19, 16, 10, 996568), dtype="datetime64"] I prefer to see this: [733234000000, 733234000000, 733234000000, 733234000000, 733234000000, dtype="datetime64"] Hmm, although for a scalar representation, I agree that this is a bit too terse. Maybe adding a 'T' (meaning 'T'ime type) and the end would be better?: In [12]: t[0] Out[12]: 733234384724T and hence: [733234000000T, 733234000000T, 733234000000T, 733234000000T, 733234000000T, dtype="datetime64"] But it would be interesting to see what other people thinks. > > > About the ``mx.DateTime`` module > > -------------------------------- > > > > In this document, the emphasis has been put in comparing the > > compatibility of future NumPy date/time types against the > > ``datetime`` module that comes with Python. Should we consider the > > compatibility with mx.DateTime as well? > > No. The whole point of python's standard datetime is to have a common > system with which to deal with date-time values -- it's too bad it > didn't come sooner, so that mx.DateTime could have been built on it, > but at this point, I think supporting the standard lib one is most > important. I see. > I couldn't find documentation (not quickly, anyway) of how the > datetime object stores its data internally, but it might be nice to > support that protocol directly -- maybe that would make for too much > math code to write, though. The internal format for the datetime module is documented in the sources, and at first sight, supporting the protocol shouldn't be too difficult. > What about timedelta types? Well, we deliberately have left timedelta out because we think that any of the three proposed types can act as a timedelta (this is also another reason for keeping the proposed representation, i.e. don't show year/month/day/etc... info). In fact, if they represent an absolute time is by the convention of having the origin of time in the UNIX epoch. But if you don't impose this convention for your array, all of timetypes can represent timedeltas. However, I suppose that there is a problem with the getters and setters here, that is, how external ``datetime`` timedeltas interacts with the new NumPy date/time types. Thinking a bit, the setter should be relatively easy to implement: In [37]: numpy.datetime64(datetime.timedelta(12)) Out [37]: 12T For the getter, one can think on adding a new method (only available for the date/time types): In [38]: t = numpy.datetime64(datetime.timedelta(12)) In [39]: t.totimedelta() Out [39]: datetime.timedelta(12) IMO, that would solve the issue without having to implement specific timedelta types. > My final thought is that while I see that different applications need > different properties, having multiple representations seems like it > will introduce a lot of maintenance, documentation and support > issues. Maybe a single, more complicated representation would be a > better bet (like using two ints, rather than one, to get both range > and precision) Yeah, but besides the fact that implementation would be quite slower, this sort of structs of two 'int64' would take twice the space of the proposed timetypes, and this can be killer for a package that is meant for dealing with large arrays of data. [Incidentally, I was even pondering to introduce some 32-bit date/time precisely for saving space, but as the usability of such a type would be really restricted, in the end I've opted to not including it]. > Thanks for working on this -- I think it will be a great addition to > numpy! Thanks for excellent feedback too! -- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion