Arun, sometimes it helps to have nomatch = 0 when using :=

dta = data.table(idx = c(1,1), key = "idx")
dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")

This fails because of cartesian join not allowed by default

That's because of a bug. `allow.cartesian` error shouldn't occur with `:=` at 
all, as the number of rows will *never* exceed `x`. IIRC, the allow.cartesian 
bugs are scheduled to be fixed for the next-next release (after 1.9.4).

Arun

From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 15, 2014 at 7:52:26 PM
To: Arunkumar Srinivasan <[email protected]>>
Cc: [email protected] 
<[email protected]>>
Subject:  Re: [datatable-help] Update table from other table  

Arun, sometimes it helps to have nomatch = 0 when using :=

dta = data.table(idx = c(1,1), key = "idx")
dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")

This fails because of cartesian join not allowed by default

dta[dtb, val := i.val]

but this doesn't

dta[dtb, val := i.val, nomatch = 0]

This is the same as doing

dta[dtb, val := i.val, allow.cartesian = TRUE]

On Fri, Sep 12, 2014 at 8:08 PM, Arunkumar Srinivasan <[email protected]> 
wrote:
Glad it helped.
Always welcome "pull requests" :).

Arun

From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 12, 2014 at 5:46:59 PM
To: Arunkumar Srinivasan <[email protected]>>
Cc: [email protected] 
<[email protected]>>
Subject:  Re: [datatable-help] Update table from other table

Great! (sorry, .EACHI = TRUE was an old definition).
It's good to know also that nomatch = 0 is irrelevant when using :=, I always 
used is to avoid the rows in dtb creeping in dtb as NAs.
Also, it's really useful to know that by = EACHI should be used when the 
calculations you are perfoming depend on the group or not. This came in really 
in handy yesterday, and should be emphasized in .EACHI description. Should I 
perform a pull request?

On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan <[email protected]> 
wrote:
I think you mean:

dta[dtb, b:=b, by=.EACHI]

and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.

by=.EACHI does exactly what it means, really. It evaluates j for each i match. 
Let’s first see the matches:

dta[dtb, which=TRUE]
# [1] 1 1 3

So, first row of dtb matches with first of dta. The second of dtb matches with 
1st of dta and so on.

When you add by=.EACHI, as shown on the top, j-expression is evaluated on each 
of these matches. So, it’ll be evaluated 3-times here. On the other hand, 
without it, j is evaluated once. In this case, it doesn’t make a difference 
either way. So you should avoid by=.EACHI, as it’ll be slower with it.

It’s particularly useful when you’d like to perform operations in j, that 
depends on the values in j on that group. For example, consider these 
data.tables dt1 and dt2:

dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")

And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI 
feature.. you’d approach the problem like this:

dt1[dt2][, list(agg = sum(y)*z[1]), by=x]

With by=.EACHI, this is simply:

dt1[dt2, list(agg=sum(y)*z), by=.EACHI]

Here, your expression is evaluated on each i.

Another interesting use case is, say, you’d like to create a lagged vector of y:

dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]

It’s that simple.. really. Basically, as long as the operation you’re 
performing in j affects it depending on whether j is executed for that group or 
as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI 
has no effect, and therefore you’re wanting to use a normal join there..

This is not a text book definition, rather my understanding of this awesome 
feature!

Hope this helps.


Arun

From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 11, 2014 at 10:16:41 PM
To: [email protected] 
<[email protected]>>
Subject:  [datatable-help] Update table from other table

What is the best data.table way of doing something similar to UPDATE FROM in 
SQL?

I used to do something like

dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
dta[dtb, b := b]

However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I 
can't determine when.

So, just to be sure, I do 

dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]

Is the .EACHI and the nomatch necessary?

In this case, I want the row with idx 1 and 3 (the matching ones) to end with a 
b value from the matching b column in dtb, and the row with idx 2 (the one that 
isn't in dtb) to end up with NA in column b.


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to