Mark Wiebe wrote: > Because datetime64 is a NumPy data type, it needs a well-defined rule > for these kinds of conversions. Treating datetimes as moments in time > instead of time intervals makes a very nice rule which appears to be > very amenable to a variety of computations, which is why I like the > approach.
This really is the key issue that I've been harping on (sorry...) this whole thread: For many uses, a datetime as a moment in time is a great abstraction, and I think how most datetime implementations (like the std lib one) are used. However, when you are trying to represent/work with data like monthly averages and the like, you need something that represents something else -- and trying to use the same mechanism as for time instants, and hoping the the ambiguities will resolve themselves from the context is dangerous. I don't work in finance, so I'm not sure about things like b-monthly payments -- it seems those could well be defined as instances -- the payments are due on a given day each month( say the 1st and 15th), and, I assume that is well defined to the instant -- i.e. before the end of the day in some time zone. (note that that would be hour time: 23:59.99999, rather than, zero, however). The trick with these comes in when you do math -- the timedelta issue -- what is a 1 month timedelta? It's NOT an given number of days, hours, etc. I don't know that anyone has time to do this, but it seems a written up set of use-cases would help focus this conversation -- I know I've pretty lost of what uses we are trying to support. another question: can you instantiate a datetime64 with something other than a string? i.e. a (year, [month], [day], [hour], [second], [usecond]) tuple? > The fact that it's a NumPy dtype probably is the biggest limiting > factor preventing parameters like 'start' and 'end' during conversion. > Having a datetime represent an instant in time neatly removes any > ambiguity, so converting between days and seconds as a unit is > analogous to converting between int32 and float32. Sure, but I don't know that that is the best way to go -- integers are precisely defined and generally used as 3 == 3.00000000 That's not the case for months, at least if it's supposed be be a monthly average-type representation. This reminds me a question recently on this list -- someone was using np.histogram() to bin integer values, and was surprised at the results -- what they needed to do was consider the bin intervals as floating point numbers to get what they wanted: 0.5, 1.5, 2.5, rather than 1,2,3,4, because what they really wanted was an categorical definition of an integer, NOT a truncated floating point number. I'm not sure how that informs this conversation, though... > > >>> np.timedelta64(10, 's') + 10 > > numpy.timedelta64(20,'s') > > Here, the unit is defined: 's' > > For the first operand, the inconsistency is with the second. Here's > the reasoning I didn't spell out: > We're adding a timedelta + int, so lets convert 10 into a timedelta. > No units specified, so it's > 10 microseconds, so we add 10 seconds and 10 microseconds, not 10 > seconds and 10 seconds. This sure seems ripe for error to me -- if a datetime and timedelta are going to be represented in various possible units, then I don't think it it's a good idea to allow one to and an integer -- especially if the unit can be inferred from the input data, rather than specified. "Explicit is better than implicit." "In the face of ambiguity, refuse the temptation to guess." If you must allow this, then using the default for the unspecified unit as above is the way to go. Dave Hirschfeld wrote: >> Here are some current behaviors that are inconsistent with the microsecond > default, but consistent with the "generic time unit" idea: >>>>> np.timedelta64(10, 's') + 10 >> numpy.timedelta64(20,'s') > > That is what I would expect (and hope) would happen. IMO an integer should be > cast to the dtype ([s]) of the datetime/timedelta. This is way too ripe for error, particularly if we have the unit auto-determined from input data. Not to take us back to a probably already resolved issue, but maybe all this unit conversion could and should be avoided by following the python datetime approach -- all datetimes and timedeltas are always defined with microsecond precision -- period. Maybe there are computational efficiencies that we want to avoid. This would also preclude any use of these dtypes for work that required greater precision, but does anyone really need both year, month, day specification AND nanoseconds? Given all the leap-second issues, that seems a bit ridiculous. But it would make things easier. I note that in this entire conversation, all the talk has been about finance examples -- I think I'm the only one that has brought up science use, and that only barely (and mostly the simple cases). So do we really need to have the same dtype useful for finance and particle physics? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion