I've been using data.table heavily since UseR! 2012. For the most part it's been nothing short of a magical panacea, until today.
I had a strange problem where using setkey on a data table makes the file huge (in comparison to what it should be) when it saves. I spent hours trying to reproduce this with similar data that I could share, but I couldn't get it to happen on simulated data. Here is an outline of my process: Data table 1 (DT1) is about 80 mb when I save Data table 2 (DT2) is about 10 mb when I save Data table 3 (DT3) = cbind(DT1, DT2) Data table 3 is about 90 mb when I save (so far so good) If I set the key of DT3 to be a particular column (for me it's isotime), suddently the table is 212 mb of disk space If I change the key to something else, or set it to NULL it still takes 212 mb HOWEVER, if I never set DT3's key to isotime, but I set it to another column instead (like a "name" field), then the file only takes about 90 mb as expected The memory ballooning only happens with the save. The actual "in memory" values for these data sets are about the right size. I need to step, but I can give more information tomorrow if you would like. I'm using R 2.15.1 "Roasted Marshmallows" and a Windows 7 machine. The package version is data.table 1.8.2
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
