I think you mean: dta[dtb, b:=b, by=.EACHI] and not .EACHI = TRUE. Not sure what’s the use of nomatch=0L along with :=.
by=.EACHI does exactly what it means, really. It evaluates j for each i match. Let’s first see the matches: dta[dtb, which=TRUE] # [1] 1 1 3 So, first row of dtb matches with first of dta. The second of dtb matches with 1st of dta and so on. When you add by=.EACHI, as shown on the top, j-expression is evaluated on each of these matches. So, it’ll be evaluated 3-times here. On the other hand, without it, j is evaluated once. In this case, it doesn’t make a difference either way. So you should avoid by=.EACHI, as it’ll be slower with it. It’s particularly useful when you’d like to perform operations in j, that depends on the values in j on that group. For example, consider these data.tables dt1 and dt2: dt1 = data.table(x=rep(1:4, each=2), y=1:8, key="x") dt2 = data.table(x=3:5, z=10, key="x") And, you’d like to get sum(y)*z while joining.. If not for the by=.EACHI feature.. you’d approach the problem like this: dt1[dt2][, list(agg = sum(y)*z[1]), by=x] With by=.EACHI, this is simply: dt1[dt2, list(agg=sum(y)*z), by=.EACHI] Here, your expression is evaluated on each i. Another interesting use case is, say, you’d like to create a lagged vector of y: dt1[dt2, list(y=y, lagy = c(NA, head(y,-1)), z=z), by=.EACHI] It’s that simple.. really. Basically, as long as the operation you’re performing in j affects it depending on whether j is executed for that group or as a whole, then you’re most likely looking for by=.EACHI. If not, by=.EACHI has no effect, and therefore you’re wanting to use a normal join there.. This is not a text book definition, rather my understanding of this awesome feature! Hope this helps. Arun From: Juan Manuel Truppia <[email protected]> Reply: Juan Manuel Truppia <[email protected]>> Date: September 11, 2014 at 10:16:41 PM To: [email protected] <[email protected]>> Subject: [datatable-help] Update table from other table What is the best data.table way of doing something similar to UPDATE FROM in SQL? I used to do something like dta = data.table(idx = c(1, 2, 3), a = runif(3), key = "idx") dtb = data.table(idx = c(1, 3), b = runif(3), key = "idx") dta[dtb, b := b] However, after the 1.9.3 and the explicit .EACHI, it fails sometimes, but I can't determine when. So, just to be sure, I do dta[dtb, b := b, .EACHI = TRUE, nomatch = 0] Is the .EACHI and the nomatch necessary? In this case, I want the row with idx 1 and 3 (the matching ones) to end with a b value from the matching b column in dtb, and the row with idx 2 (the one that isn't in dtb) to end up with NA in column b. _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
