Arun, sometimes it helps to have nomatch = 0 when using :=
dta = data.table(idx = c(1,1), key = "idx")
dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")
This fails because of cartesian join not allowed by default
That's because of a bug. `allow.cartesian` error shouldn't occur with `:=` at
all, as the number of rows will *never* exceed `x`. IIRC, the allow.cartesian
bugs are scheduled to be fixed for the next-next release (after 1.9.4).
Arun
From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 15, 2014 at 7:52:26 PM
To: Arunkumar Srinivasan <[email protected]>>
Cc: [email protected]
<[email protected]>>
Subject: Re: [datatable-help] Update table from other table
Arun, sometimes it helps to have nomatch = 0 when using :=
dta = data.table(idx = c(1,1), key = "idx")
dtb = data.table(idx = c(1,2), val = c("a", "b"), key = "idx")
This fails because of cartesian join not allowed by default
dta[dtb, val := i.val]
but this doesn't
dta[dtb, val := i.val, nomatch = 0]
This is the same as doing
dta[dtb, val := i.val, allow.cartesian = TRUE]
On Fri, Sep 12, 2014 at 8:08 PM, Arunkumar Srinivasan <[email protected]>
wrote:
Glad it helped.
Always welcome "pull requests" :).
Arun
From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 12, 2014 at 5:46:59 PM
To: Arunkumar Srinivasan <[email protected]>>
Cc: [email protected]
<[email protected]>>
Subject: Re: [datatable-help] Update table from other table
Great! (sorry, .EACHI = TRUE was an old definition).
It's good to know also that nomatch = 0 is irrelevant when using :=, I always
used is to avoid the rows in dtb creeping in dtb as NAs.
Also, it's really useful to know that by = EACHI should be used when the
calculations you are perfoming depend on the group or not. This came in really
in handy yesterday, and should be emphasized in .EACHI description. Should I
perform a pull request?
On Fri, Sep 12, 2014 at 12:14 PM, Arunkumar Srinivasan <[email protected]>
wrote:
I think you mean:
dta[dtb, b:=b, by=.EACHI]
and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.
by=.EACHI does exactly what it means, really. It evaluates j for each i match.
Let’s first see the matches:
dta[dtb, which=TRUE]
# [1] 1 1 3
So, first row of dtb matches with first of dta. The second of dtb matches with
1st of dta and so on.
When you add by=.EACHI, as shown on the top, j-expression is evaluated on each
of these matches. So, it’ll be evaluated 3-times here. On the other hand,
without it, j is evaluated once. In this case, it doesn’t make a difference
either way. So you should avoid by=.EACHI, as it’ll be slower with it.
It’s particularly useful when you’d like to perform operations in j, that
depends on the values in j on that group. For example, consider these
data.tables dt1 and dt2:
dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x")
dt2 = data.table(x=3:5, z=10, key="x")
And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI
feature.. you’d approach the problem like this:
dt1[dt2][, list(agg = sum(y)*z[1]), by=x]
With by=.EACHI, this is simply:
dt1[dt2, list(agg=sum(y)*z), by=.EACHI]
Here, your expression is evaluated on each i.
Another interesting use case is, say, you’d like to create a lagged vector of y:
dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI]
It’s that simple.. really. Basically, as long as the operation you’re
performing in j affects it depending on whether j is executed for that group or
as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI
has no effect, and therefore you’re wanting to use a normal join there..
This is not a text book definition, rather my understanding of this awesome
feature!
Hope this helps.
Arun
From: Juan Manuel Truppia <[email protected]>
Reply: Juan Manuel Truppia <[email protected]>>
Date: September 11, 2014 at 10:16:41 PM
To: [email protected]
<[email protected]>>
Subject: [datatable-help] Update table from other table
What is the best data.table way of doing something similar to UPDATE FROM in
SQL?
I used to do something like
dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx")
dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx")
dta[dtb, b := b]
However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I
can't determine when.
So, just to be sure, I do
dta[dtb, b := b, .EACHI = TRUE, nomatch = 0]
Is the .EACHI and the nomatch necessary?
In this case, I want the row with idx 1 and 3 (the matching ones) to end with a
b value from the matching b column in dtb, and the row with idx 2 (the one that
isn't in dtb) to end up with NA in column b.
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help