On Wed, Jun 8, 2011 at 11:57 AM, Wes McKinney <[email protected]> wrote: > On Wed, Jun 8, 2011 at 7:36 AM, Chris Barker <[email protected]> wrote: >> On 6/7/11 4:53 PM, Pierre GM wrote: >>> Anyhow, each time yo >>> read 'frequency' in scikits.timeseries, think 'unit'. >> >> or maybe "precision" -- when I think if unit, I think of something that >> can be represented as a floating point value -- but here, with integers, >> it's the precision that can be represented. Just a thought. >> >>> Well, it can be argued that the epoch is 0... >> >> yes, but that really should be transparent to the user -- what epoch is >> chosen should influence as little as possible (e.g. only the range of >> values representable) >> >>> Mmh. How would you define a quarter unit ? [3M] ? But then, what if >>> you want your year to start in December, say (we often use >>> DJF/MAM/JJA/SON as a way to decompose a year in four 'hydrological' >>> seasons, for example) >> >> And the federal fiscal year is Oct - Sept, so the first quarter is (Oct, >> Nov, Dec) -- clearly that needs to be flexible. >> >> >> -Chris >> >> >> >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> [email protected] >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Your guys' discussion is a bit overwhelming for me in my currently > jet-lagged state ( =) ) but I thought I would comment on a couple > things, especially now with the input of another financial Python user > (great!). > > Note that I use scikits.timeseries very little for a few reasons (a > bit OT, but...): > > - Fundamental need to be able to work with multiple time series, > especially performing operations involving cross-sectional data > - I think it's a bit hard for lay people to use (read: ex-MATLAB/R > users). This is just my opinion, but a few years ago I thought about > using it and concluded that teaching people how to properly use it (a > precision tool, indeed!) was going to cause me grief. > - The data alignment problem, best explained in code: > > In [8]: ts > Out[8]: > 2000-01-05 00:00:00 0.0503706684002 > 2000-01-12 00:00:00 -1.7660004939 > 2000-01-19 00:00:00 1.11716758554 > 2000-01-26 00:00:00 -0.171029995265 > 2000-02-02 00:00:00 -0.99876580126 > 2000-02-09 00:00:00 -0.262729046405 > > In [9]: ts.index > Out[9]: > <class 'pandas.core.daterange.DateRange'> > offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None > [2000-01-05 00:00:00, ..., 2000-02-09 00:00:00] > length: 6 > > In [10]: ts2 = ts[:4] > > In [11]: ts2.index > Out[11]: > <class 'pandas.core.daterange.DateRange'> > offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None > [2000-01-05 00:00:00, ..., 2000-01-26 00:00:00] > length: 4 > > In [12]: ts + ts2 > Out[12]: > 2000-01-05 00:00:00 0.1007413368 > 2000-01-12 00:00:00 -3.5320009878 > 2000-01-19 00:00:00 2.23433517109 > 2000-01-26 00:00:00 -0.34205999053 > 2000-02-02 00:00:00 NaN > 2000-02-09 00:00:00 NaN > > Or ts / or ts2 could be completely DateRange-naive (e.g. they have no > way of knowing that they are fixed-frequency), or even out of order, > and stuff like this will work no problem. I view the "fixed frequency" > issue as sort of an afterthought-- if you need it, it's there for you > (the DateRange class is a valid Index--"label vector"--for pandas > objects, and provides an API for defining custom time deltas). Which > leads me to: > > - Inability to derive custom offsets: > > I can do: > > In [14]: ts.shift(2, offset=2 * datetools.BDay()) > Out[14]: > 2000-01-11 00:00:00 0.0503706684002 > 2000-01-18 00:00:00 -1.7660004939 > 2000-01-25 00:00:00 1.11716758554 > 2000-02-01 00:00:00 -0.171029995265 > 2000-02-08 00:00:00 -0.99876580126 > 2000-02-15 00:00:00 -0.262729046405 > > or even generate, say, 5-minutely or 10-minutely date ranges thusly: > > In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00', > offset=datetools.Minute(5)) > Out[16]: > <class 'pandas.core.daterange.DateRange'> > offset: <5 Minutes>, tzinfo: None > [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00] > length: 85 > > I'm currently working on high perf reduceat-based resampling methods > (e.g. converting secondly data to 5-minutely data). > > So in summary, w.r.t. time series data and datetime, the only things I > care about from a datetime / pandas point of view: > > - Ability to easily define custom timedeltas > - Generate datetime objects, or some equivalent, which can be used to > back pandas data structures > - (possible now??) Ability to have a set of frequency-naive dates > (possibly not in order). > > This last point actually matters. Suppose you wanted to get the worst > 5-performing days in the S&P 500 index: > > In [7]: spx.index > Out[7]: > <class 'pandas.core.daterange.DateRange'> > offset: <1 BusinessDay>, tzinfo: None > [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00] > length: 2963 > > # but this is OK > In [8]: spx.order()[:5] > Out[8]: > 2008-10-15 00:00:00 -0.0903497960942 > 2008-12-01 00:00:00 -0.0892952780505 > 2008-09-29 00:00:00 -0.0878970494885 > 2008-10-09 00:00:00 -0.0761670761671 > 2008-11-20 00:00:00 -0.0671229140321 > > - W >
I should add that if datetime64 gets me 80% to solving my needs (which are rather domain-specific), I will be very happy. Reducing the memory footprint of long time series (versus having millions of datetime.datetime objects lying around) will also be a big benefit. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
