Ok, but sorting on POSIXct(double) should be less efficient than on int64 isn't it (via a radix sort)?
Additionally, I don't know what you think of adding IMonth (looking like "2011-02"), when grouping, at present we can use month but it does not dissociate the year, it could be quick and useful for stats computed by group. Regards 2013/1/3 Matthew Dowle <[email protected]> > ** > > > > Hi, > > One reason 'double' type was added to setkey was to allow POSIXct in keys. > That was as recently as v1.8.2 : > > o Numeric columns (type 'double') are now allowed in keys and ad hoc > by. J() and SJ() no longer coerce 'double' to 'integer'. i join > columns > which mismatch on numeric type are coerced silently to match > the type of x's join column. Two floating point values > are considered equal (by grouping and binary search joins) if their > difference is within sqrt(.Machine$double.eps), by default. See > example > in ?unique.data.table. Completes FRs #951, #1609 and #1075. This > paves the > way for other atomic types which use 'double' (such as POSIXct and > bit64). > Thanks to Chris Neff for beta testing and finding problems with keys > of two numeric columns (bug #2004), fixed and tests added. > > So, POSIXct, or using integer64 to store YYYYMMDDHHMMSSmmm is another > possibility (no epoch has some pros as well as cons), or date and time > held in separate columns. > > The thinking is, rightly or wrongly, that R already supports milliseconds > in various ways. data.table doesn't aim to prescribe which datetime class > you place in the data.table; it's up to you what you use. It only has > IDate because Date in R is (oddly) stored as numeric rather than integer > which (I at least) have never really understood. For a long time > data.table only supported integer columns in keys and joins (including > factors which are integers/enumerations). But now double (and character) > are fine in keys too. > > So to answer your question as asked: as.POSIXct("2010-01-03 > 09:34:54.342697") already works. But note : > > > http://stackoverflow.com/questions/10931972/r-issue-with-rounding-milliseconds > > http://stackoverflow.com/questions/11136340/zoo-xts-microsecond-read-issue > > > http://stackoverflow.com/questions/8889554/milliseconds-puzzle-when-calling-strptime-in-r > > http://stackoverflow.com/questions/2150138/how-to-parse-milliseconds-in-r > > HTH, also : > > http://stackoverflow.com/a/14063077/403310 > > But yes I'm sure we can do better, just not quite sure precisely how. > > Matthew > > > > On 03.01.2013 11:17, colin umansky wrote: > > Hello, > I have been thinking about how data.table deals with dateTime and would > like to share my questions/opinions. > Where I think data.table is (likely to be wrong :)) > At the moment data.table deals independently with IDate and ITime > (%H:%M:%S) that are simple (Matthew Doyle words) derived class. As I > understand it they are stored as integers to enable fast radix sorting > etc... > There is no milli/micro/nano which is a problem as far as financial time > series are concerned. > Suggestions: > Would that be possible to store a IDateTime as the number of micro since > epoch-time ? > an IDateTime object would be represented like a=as.IDateTime("2010-01-03 > 09:34:54.342697"), then > year: asIYear(a); #would display "2010" > month: as.IMonth(a); #would display "2010-01" > date: as.IDate(a); #would display "2010-01-03" > etc... > Having all those built-in types would probably be useful to efficient > grouping. > PS: > The best soft I have experienced, to deal with timeseries, data is kdb ( > http://kx.com/) > I particularly like the way datetimes are handled ( > http://code.kx.com/wiki/JB:QforMortals/atoms#time), it may be a source of > inspiration... > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
