Thank you very much for responding to the draft idea! Matthew, your opinions are very educational and enjoyable!
Great, that is is promising. The main difficulty in 'allowing' different > column types is sorting them efficiently. By efficiently we mean as > fast, or close to as fast, as radix sorting (actually a counting sort) > of integers. If there is a way to sort bit64 then it should be fine. > I'm not quite clear if bit64 is for 64bit machines only or not. But that > can be switched without too much difficulty. I am more confident that bit64 also support 32bit machine with the following support: 1. I can't find any warning for bit64 not supporting 32bit machine. Can't image it doesn't support without a warning. 2. I indeed find the compiled bit64.dll in bit64\libs\i386 folder. If it doesn't compile for 32bit machine, this folder and dll won't even exist. As for sorting, in page 9: *Limitations planned to be removed with the next release* *• sort is not yet implemented* *• order is not yet implemented* *• match is not yet implemented* *• duplicated is not yet implemented* *• unique is not yet implemented* *• table is not yet implemented* *• as.factor is not yet implemented* * * *Further limitations* *• subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the underlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding* 1. Not sure whether data.table use its customized sorting or R's default sorting method. I presume it's later case. 2. In later case, what bit64 is going to implement will become critical. Not sure whether the author (Dr. Jens Oehlschlägel) plans for something as fast as counting sort? 3. Maybe we can kindly remind him? He must also be very interested too, because we can tell that he is also a fan of high-performance computing (Actually, I later found Dr. Jens Oehlschlägel is also the author ff pacakge). I sincerely hope he will also be happy to see the great potential in leveraging his new package in data.table community. :))) 4. Does it imply that data.table can also support double type as the key column once bit64 fast sorting is available? since bit64 is internally double type. Nope, 64bit R is still limited to 2^31 vector length. What is freed in > 64bit R is that you can have many more 2^31 vectors in memory at once. > So a data.table can be 2 billion rows and as many columns that can fit > in RAM. Remember a 2 billion (2^31) numeric vector is 2^31 * 8 / 1024^3 > = 16GB. That's quite a bit for a single vector! Lets say hardware > limitations are 128GB of RAM currently (at reasonable cost). With just > 8 columns and 2 billion rows, your RAM is full anyway with no room for > copies, let alone the OS itself. In practice the vector length > limitation rarely bites. Thank you very much for pointing out. Aha, that's why I didn't remember 2^31 vector length was a problem. But I couldn't remember the detail and thus was scared when you raised the issue. Best regards,
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
