Günter,
Great report! I'm able to reproduce it on 1.8.11 here. Will file a bug and look 
into it.
Thanks again for reporting.


Arun
From: Günter J. Hitsch Günter J. Hitsch
Reply: Günter J. Hitsch [email protected]
Date: January 22, 2014 at 9:52:36 PM
To: [email protected] 
[email protected]
Subject:  [datatable-help] segfault with "large" number of rows  

I’ve been using data.table for several months. It’s a great package—thank you 
for developing it!  

Here’s my question: I’ve run into a problem when I use “large” data tables with 
many millions of rows. In particular, for such large data tables I get 
segmentation faults when I create columns by groups. Example:  

N = 2500         # No. of groups  
T = 100000       # No. of observations per group  

DT = data.table(group = rep(1:N, each = T), x = 1)  
setkey(DT, group)  

DT[, sum_x := sum(x), by = group]  
print(head(DT))  

This runs fine. But when I increase the number of groups, say from 2500 to 
3000, I get a segfault:  

N = 3000         # No. of groups  
T = 100000       # No. of observations per group  

...  

*** caught segfault ***  
address 0x159069140, cause 'memory not mapped'  

Traceback:  
1: `[.data.table`(DT, , `:=`(sum_x, sum(x)), by = group)  
2: DT[, `:=`(sum_x, sum(x)), by = group]  
3: eval(expr, envir, enclos)  
4: eval(ei, envir)  
5: withVisible(eval(ei, envir))  


I can reproduce this problem on:  

(1) OS X 10.9, R 3.0.2, data.table 1.8.10  
(2) Ubuntu 13.10, R 3.0.1, data.table 1.8.10  

And of course the amount of RAM in my machines is not the issue.  

Thanks in advance for your help with this!  

Günter  

_______________________________________________  
datatable-help mailing list  
[email protected]  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to