Hi Guenter, CC: data.table list, I filed this as bug #5305 and now we've now fixed it with commit 1100 v1.8.11. Thank you very much once again for reporting!
On Wed, Jan 22, 2014 at 9:52 PM, "Günter J. Hitsch" <[email protected]>wrote: > > I’ve been using data.table for several months. It’s a great package—thank > you for developing it! > > Here’s my question: I’ve run into a problem when I use “large” data > tables with many millions of rows. In particular, for such large data > tables I get segmentation faults when I create columns by groups. Example: > > N = 2500 # No. of groups > T = 100000 # No. of observations per group > > DT = data.table(group = rep(1:N, each = T), x = 1) > setkey(DT, group) > > DT[, sum_x := sum(x), by = group] > print(head(DT)) > > This runs fine. But when I increase the number of groups, say from 2500 > to 3000, I get a segfault: > > N = 3000 # No. of groups > T = 100000 # No. of observations per group > > ... > > *** caught segfault *** > address 0x159069140, cause 'memory not mapped' > > Traceback: > 1: `[.data.table`(DT, , `:=`(sum_x, sum(x)), by = group) > 2: DT[, `:=`(sum_x, sum(x)), by = group] > 3: eval(expr, envir, enclos) > 4: eval(ei, envir) > 5: withVisible(eval(ei, envir)) > > > I can reproduce this problem on: > > (1) OS X 10.9, R 3.0.2, data.table 1.8.10 > (2) Ubuntu 13.10, R 3.0.1, data.table 1.8.10 > > And of course the amount of RAM in my machines is not the issue. > > Thanks in advance for your help with this! > > Günter > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
