I think you mean:

dta[dtb, b:=b, by=.EACHI]
and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.

by=.EACHI does exactly what it means, really. It evaluates j for each i match. 
Let’s first see the matches:

dta[dtb, which=TRUE]
# [1] 1 1 3
So, first row of dtb matches with first of dta. The second of dtb matches with 
1st of dta and so on.

When you add by=.EACHI, as shown on the top, j-expression is evaluated on each 
of these matches. So, it’ll be evaluated 3-times here. On the other hand, 
without it, j is evaluated once. In this case, it doesn’t make a difference 
either way. So you should avoid by=.EACHI, as it’ll be slower with it.

It’s particularly useful when you’d like to perform operations in j, that 
depends on the values in j on that group. For example, consider these 
data.tables dt1 and dt2:

dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")
And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI 
feature.. you’d approach the problem like this:

dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
With by=.EACHI, this is simply:

dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
Here, your expression is evaluated on each i.

Another interesting use case is, say, you’d like to create a lagged vector of y:

dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
It’s that simple.. really. Basically, as long as the operation you’re 
performing in j affects it depending on whether j is executed for that group or 
as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI 
has no effect, and therefore you’re wanting to use a normal join there..

This is not a text book definition, rather my understanding of this awesome 
feature!

Hope this helps.


Arun

From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 11, 2014 at 10:16:41 PM
To: [email protected] 
<[email protected]>>
Subject:  [datatable-help] Update table from other table  

What is the best data.table way of doing something similar to UPDATE FROM in 
SQL?

I used to do something like

dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
dta[dtb, b := b]

However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I 
can't determine when.

So, just to be sure, I do 

dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]

Is the .EACHI and the nomatch necessary?

In this case, I want the row with idx 1 and 3 (the matching ones) to end with a 
b value from the matching b column in dtb, and the row with idx 2 (the one that 
isn't in dtb) to end up with NA in column b.


_______________________________________________  
datatable-help mailing list  
[email protected]  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to