On Tue, Jun 7, 2011 at 6:53 PM, Pierre GM <pgmdevl...@gmail.com> wrote:
> > On Jun 8, 2011, at 1:16 AM, Mark Wiebe wrote: > > > Hi Dave, > > > > Thanks for all the feedback on the datetime, it's very useful to help > understand the timeseries ideas, in particular with the many examples you're > sprinkling in. > > > > One overall impression I have about timeseries in general is the use of > the term "frequency" synonymously with the time unit. To me, a frequency is > a numerical quantity with a unit of 1/(time unit), so while it's related to > the time unit, naming it the same is something the specific timeseries > domain has chosen to do, I think the numpy datetime class shouldn't have > anything called "frequency" in it, and I would like to remove the current > usage of that terminology from the codebase. > > True. We rather abused the term in scikits.timeseries, but we meant it as > "given time unit". > Matt came with the idea of representing a series of consecutive dates as an > array of consecutive integers. The conversion integer<>datetime is done > internally with an epoch and a unit. Initially, we called this latter > frequency, but in the experimental git version I switched to unit. Anyhow, > each time yo read 'frequency' in scikits.timeseries, think 'unit'. Sounds good. > In Wes's comment, he said > > > > I'm hopeful that the datetime64 dtype will enable scikits.timeseries > > and pandas to consolidate much ofir the datetime / frequency code. > > scikits.timeseries has a ton of great stuff for generating dates with > > all the standard fixed frequencies. > > > > implying to me that the important functionality needed in time series is > the ability to generate arrays of dates in specific ways. I suspect equating > the specification of the array of dates and the unit of precision used to > store the date isn't good for either the datetime functionality or > supporting timeseries, and I'm presently trying to understand what it is > that timeseries uses. > > You want a series of 365 consecutive days from today ? 'now' + > np.arange(365). This kind of stuff. > This one works: >>> np.datetime64('today') + np.arange(365) array(['2011-06-07', '2011-06-08', '2011-06-09', '2011-06-10', '2011-06-11', '2011-06-12', '2011-06-13', '2011-06-14', '2011-06-15', '2011-06-16', '2011-06-17', '2011-06-18', '2011-06-19', '2011-06-20', '2011-06-21', '2011-06-22', '2011-06-23', '2011-06-24', '2011-06-25', '2011-06-26', '2011-06-27', '2011-06-28', '2011-06-29', '2011-06-30', '2011-07-01', '2011-07-02', '2011-07-03', '2011-07-04', '2011-07-05', '2011-07-06', '2011-07-07', '2011-07-08', '2011-07-09', '2011-07-10', '2011-07-11', '2011-07-12', '2011-07-13', '2011-07-14', '2011-07-15', '2011-07-16', '2011-07-17', '2011-07-18', '2011-07-19', '2011-07-20', <snip> '2012-05-28', '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01', '2012-06-02', '2012-06-03', '2012-06-04', '2012-06-05'], dtype='datetime64[D]') >>> > > On Tue, Jun 7, 2011 at 7:34 AM, Dave Hirschfeld < > dave.hirschf...@gmail.com> wrote: > > > > I think some of the complexity is coming from the definition of the > timedelta. > > In the timeseries package each date simply represents the number of > periods > > since the epoch and the difference between dates is therefore just and > integer > > with no attached metadata - its meaning is determined by the context it's > used > > in. e.g. > > Exactly that. > > > timeseries gets on just fine without a timedelta type - a timedelta is > just an > > integer and if you add an integer to a date it's interpreted as the > number of > > periods of that dates frequency. From a useability point of view M1 + 1 > is > > much nicer than having to do something like M1 + ts.TimeDelta(M1.freq, > 1). > > Likewise, the difference between two dates is just an integer. > > [Mark] > > I think the timedelta is important, especially with the large number of > units NumPy's datetime supports. When you're subtracting two nanosecond > datetimes and two minute datetimes in the same code, having the units there > to avoid confusion is pretty useful. > > Indeed. > > > I don't envision 'asfreq' being a datetime function, this is the kind of > thing that would layer on top in a specialized timeseries library. The > behavior of timedelta follows a more physics-like idea with regard to the > time unit, and I don't think something more complicated belongs at the > bottom layer that is shared among all datetime uses. > > 'asfreq' converts from one unit to another (there's another function, > convert, that does not quite exactly the same thing, but I won't get into > details here). You'll probably have to take unit conversion into account if > you allow the .view() or .astype() methods on your np.datetime array... > It supports .astype(), with a truncation policy. This is motivated partially because that's how Pythons integer division works, and partially because if you consider a full datetime '2011-03-14T13:22:16', it's natural to think of the year as '2011', the date as '2011-03-14', etc, which is truncation. With regards to converting in the other direction, you can think of a datetime as representing a single moment in time, regardless of its unit of precision, and equate '2011' with '2011-01', etc. > In [80]: ts.Date('S', (_64.value + _65.value)//2) > > Out[80]: <S : 02-Jul-2011 12:00:00> > > > > Adding dates definitely doesn't work, because datetimes have no zero, but > I would express it like this: > > Well, it can be argued that the epoch is 0... But in scikits.timeseries, > keep in mind that underneath, a DateArray is just an array of integer. > Yeah, that's the implementation, but letting the abstraction leak doesn't provide a real benefit I can see. [Dave] > > I really like the idea of being able to specify multiples of the base > frequency > > - e.g. [7D] is equivalenty to [W] not the least because it provides an > easy > > way to specify quarters [3M] or seasons [6M] which are important in my > work. > > NB: I also deal with half-hourly and quarter-hourly timeseries and I'm > sure > > there are many other example which are all made possible by allowing > > multipliers. > > Well, the experimental version kinda allowed that... > > > > > This is one of the things where I think mixing the datetime storage > precision with timeseries frequency seems counterproductive. Having > different origins for datetime64 starting on different weekdays near > 1970-01-01 doesn't seem like the right way to tackle the problem to me. I > see other valid reasons for reintroducing the origin metadata, but this one > I don't really like. > > We needed the concept to convert time series, for example from monthly to > quarterly (what is the first month of the year (as in succession of 12 > months) you want to start with ?) Does that need to be in the underlying datetime for layering a good timeseries implementation on top? -Mark
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion