Ralf Gommers <ralf.gommers <at> gmail.com> writes: >
> On Fri, Nov 8, 2013 at 8:22 PM, Charles R Harris <charlesr.harris <at> gmail.com> wrote: > > > and think that the main thing missing at this point is fixing the datetime problems. > > > Is anyone planning to work on this? If yes, you need a rough estimate of when this is ready to go. If no, it needs to be decided if this is critical for the release. From the previous discussion I tend to think so. If it's critical but no one does it, why plan a release....... > > > Ralf > Just want to pipe up here as to the criticality of datetime bug. Below is a minimal example from some data analysis code I found in our company that was giving incorrect results (fortunately it was caught by thorough testing): In [110]: records = [ ...: ('2014-03-29 23:00:00', '2014-03-29 23:00:00'), ...: ('2014-03-30 00:00:00', '2014-03-30 00:00:00'), ...: ('2014-03-30 01:00:00', '2014-03-30 01:00:00'), ...: ('2014-03-30 02:00:00', '2014-03-30 02:00:00'), ...: ('2014-03-30 03:00:00', '2014-03-30 03:00:00'), ...: ('2014-10-25 23:00:00', '2014-10-25 23:00:00'), ...: ('2014-10-26 00:00:00', '2014-10-26 00:00:00'), ...: ('2014-10-26 01:00:00', '2014-10-26 01:00:00'), ...: ('2014-10-26 02:00:00', '2014-10-26 02:00:00'), ...: ('2014-10-26 03:00:00', '2014-10-26 03:00:00')] ...: ...: ...: data = np.asarray(records, dtype=[('date obj', 'M8[h]'), ('str repr', object)]) ...: df = pd.DataFrame(data) In [111]: df Out[111]: date obj str repr 0 2014-03-29 23:00:00 2014-03-29 23:00:00 1 2014-03-30 00:00:00 2014-03-30 00:00:00 2 2014-03-30 00:00:00 2014-03-30 01:00:00 3 2014-03-30 01:00:00 2014-03-30 02:00:00 4 2014-03-30 02:00:00 2014-03-30 03:00:00 5 2014-10-25 22:00:00 2014-10-25 23:00:00 6 2014-10-25 23:00:00 2014-10-26 00:00:00 7 2014-10-26 01:00:00 2014-10-26 01:00:00 8 2014-10-26 02:00:00 2014-10-26 02:00:00 9 2014-10-26 03:00:00 2014-10-26 03:00:00 Note the local timezone adjusted `date obj` including the duplicate value at the clock-change in March and the missing value at the clock-change in October. As you can imagine this could very easily lead to incorrect analysis. If running this exact same code in the (Eastern) US you'd see the following results: date obj str repr 0 2014-03-30 03:00:00 2014-03-29 23:00:00 1 2014-03-30 04:00:00 2014-03-30 00:00:00 2 2014-03-30 05:00:00 2014-03-30 01:00:00 3 2014-03-30 06:00:00 2014-03-30 02:00:00 4 2014-03-30 07:00:00 2014-03-30 03:00:00 5 2014-10-26 03:00:00 2014-10-25 23:00:00 6 2014-10-26 04:00:00 2014-10-26 00:00:00 7 2014-10-26 05:00:00 2014-10-26 01:00:00 8 2014-10-26 06:00:00 2014-10-26 02:00:00 9 2014-10-26 07:00:00 2014-10-26 03:00:00 Unfortunately I don't have the skills to meaningfully contribute in this area but it is a very real problem for users of numpy, many of whom are not active on the mailing list. HTH, Dave _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion