Jeremiah,

Thanks. Just a few hours ago, I answered a similar question to a post from Ron 
(pasted below):

`data.table` is designed for working with *really large* data sets in mind (> 
100 or 200 GB in memory even). And therefore, as a design feature, it trades in 
"referential transparency" for manipulating data objects *as efficient as 
possible* in terms of both *speed* and *memory usage* (most of the times they 
go hand-in-hand).

This is perhaps the biggest design choice one needs to be aware of when 
working/choosing data.tables. It is possible to modify objects by reference 
using data.table - All the functions that begin with "set*" modify objects by 
reference. The only other non "set*" function is `:=` operator.
There’s a pending feature request on adding this point (on explicit copy) to 
the FAQs, which we’ve not gotten to, yet.

To our knowledge, people do overcome this difference quite quickly.

It’s not necessary to know about pointers to understand that the object gets 
modified in-place. I’m not a python user at all, but recently came to know that 
this is also a feature there: https://docs.python.org/2/library/copy.html

But point taken. That explicit copy will be required will be added to the FAQs.


Arun

From: jeremiah rounds [email protected]
Reply: jeremiah rounds [email protected]
Date: June 14, 2014 at 7:23:22 AM
To: [email protected] 
[email protected]
Subject:  [datatable-help] Are you aware of this?  

As a fan of your work I have always been curious if you are aware of this?  I 
find it causes new users to make mistakes.


> dt = list()
> dt$x = 1:10
> dt$y = letters[10:1]
> dt = as.data.table(as.data.frame(dt))
> dt
     x y
 1:  1 j
 2:  2 i
 3:  3 h
 4:  4 g
 5:  5 f
 6:  6 e
 7:  7 d
 8:  8 c
 9:  9 b
10: 10 a
> x0 = dt$x
> x1 = dt$x
> x0[1] = 11
> setkeyv(dt,"y")
> x0
 [1] 11  2  3  4  5  6  7  8  9 10
> x1
 [1] 10  9  8  7  6  5  4  3  2  1
> x1 == x0
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


x0 and x1 have assignments at the same exact time, and since R data.frame's 
will not do this, it lures people into thinking they are then identical and 
distinct as they are with data.frame's.  My theory is they are not actually 
copied: they are promised.  When x0 has its index 1 changed it induces a copy 
distinct from dt$x, but x1 has had no operation on it so it refers to dt$x with 
its promise. Setting the key on dt reorders it and since x1 still hasn't been 
evaluated it now matches the order of dt.

I found new users getting unpredictable results because they would try to use a 
data.table as a data.frame and induce this with sorts.  If you thought you 
copied something in a particular order in dt by doing the assigning ahead of 
the setkeyv you make a mistake.   You don't really expect x1 assigned maybe a 
page of code above to have its order changed by a setkeyv.  You do if you think 
about C pointers and references, but in R you really don't think that way.  
Many R users don't even know what a pointer is.


Thanks,
Jeremiah

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] locfit_1.5-9.1       edgeR_3.4.2          limma_3.18.13       
[4] data.table_1.9.2     GenomicRanges_1.14.4 XVector_0.2.0       
[7] IRanges_1.20.7       BiocGenerics_0.8.0  

loaded via a namespace (and not attached):
[1] grid_3.0.1      lattice_0.20-15 plyr_1.8.1      Rcpp_0.11.1    
[5] reshape2_1.4    stats4_3.0.1    stringr_0.6.2   tools_3.0.1    



_______________________________________________  
datatable-help mailing list  
[email protected]  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to