I’ve been using data.table for several months.  It’s a great package—thank you 
for developing it!

Here’s my question:  I’ve run into a problem when I use “large” data tables 
with many millions of rows.  In particular, for such large data tables I get 
segmentation faults when I create columns by groups.  Example:

N = 2500                        # No. of groups
T = 100000              # No. of observations per group

DT = data.table(group = rep(1:N, each = T), x = 1)
setkey(DT, group)

DT[, sum_x := sum(x), by = group]
print(head(DT))

This runs fine.  But when I increase the number of groups, say from 2500 to 
3000, I get a segfault:

N = 3000                        # No. of groups
T = 100000              # No. of observations per group

...

 *** caught segfault ***
address 0x159069140, cause 'memory not mapped'

Traceback:
 1: `[.data.table`(DT, , `:=`(sum_x, sum(x)), by = group)
 2: DT[, `:=`(sum_x, sum(x)), by = group]
 3: eval(expr, envir, enclos)
 4: eval(ei, envir)
 5: withVisible(eval(ei, envir))


I can reproduce this problem on:

(1) OS X 10.9, R 3.0.2, data.table 1.8.10
(2) Ubuntu 13.10, R 3.0.1, data.table 1.8.10

And of course the amount of RAM in my machines is not the issue.

Thanks in advance for your help with this!

Günter

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to