Arun, Yes, DT1[DT2, y, .JOIN = FALSE] would do the same as DT1[DT2][, y] does currently. No, DT1[DT2, y, .JOIN=FALSE], will NOT do a by-without-by, which is literally a 'by' by each of the rows of DT2 that are in the join (thus each.i! - the operation 'y' will be performed for each of the rows of 'i' and then combined and returned). There is no efficiency issue here that I can see, but Matthew can correct me on this. As far as I understand the efficiency comes into play when e.g. the rows of 'i' are unique, and after the join you'd like to do a 'by' by those, then DT1[DT2][, j, by = key(DT1)] would be less efficient since the 'by' could've already been done while joining.
DT1[DT2, .JOIN=FALSE] would be equivalent to both current and future DT1[DT2] - in this expression there is no by-without-by happening in either case. The purpose of this is NOT for j just being a column or an expression that gets evaluated into a signal column. It applies to any j. The extra 'by-without-by' column is currently output independently of how many columns you output in your j-expression, the behavior is very similar as to when you specify a by=., except that the 'by' happens by a very special expression, that only exists when joining two data-tables and that generally doesn't exist before or after the join. Hope this answers your questions. On Tue, Apr 30, 2013 at 8:48 AM, Arunkumar Srinivasan <[email protected] > wrote: > Eduard, thanks for your reply. But somethings are unclear to me still. > I'll try to explain them below. > > First I prefer .JOIN (or cross.apply) just because `each.i` seems general > (that it is applicable to *every* i operation, which as of now seems > untrue). .JOIN is specific to data.table type for `i`. > > From what I understand from your reply, if (.JOIN = FALSE), then, > > DT1[DT2, y, .JOIN = FALSE] <=> DT1[DT2][, y] > > Is this right? It's a bit confusing because I think you're okay with > "by-without-by" and I got the impression from Sadao that he finds the > syntax of "by-without-by" unaccessible/advanced for basic users. So, just > to clarify, here the DT1[DT2, y, .JOIN=FALSE] will still do the > "by-without-by" and then result in a "vector", right? > > Matthew explains in the current documentation that DT1[DT2][, y] would > "join" all columns of DT1 and DT2 and then subset. I assume the > implementation underneath is *not* DT1[DT2][, y] rather the result is an > efficient equivalence. Then, that of course seems alright to me. > > If what I've told so far is right, then the syntax `DT1[DT2, .JOIN=FALSE]` > doesn't make sense/has no purpose to me. At least I can't think of any at > the moment. > > To conclude, IMHO, if the purpose of `.JOIN` is to provide the same as > DT1[i, j] for DT1[DT2, j] (j being a column or an expression that results > in getting evaluated as a scalar for every group in the current > by-without-by syntax), then, I find this is covered in `drop = TRUE/FALSE`. > Correct me if I am wrong. But, one could do: `DT1[DT2, j, drop=TRUE]` > instead of `DT1[DT2, j, .JOIN=FALSE]` and DT1[i, j, drop=FALSE] instead of > DT1[i, list(x,y)]. > > If you/anyone believes it's wrong, I'd be all ears to clarify as to what's > the purpose of `drop` then (and also how it *doesn't* suit here as compared > to .JOIN). > > Arun > > On Tuesday, April 30, 2013 at 2:54 PM, Eduard Antonyan wrote: > > Arun, > > If the new boolean is false, the result would be the same as without it > and would be equal to current behavior of d[i][, j]. If it's true, it will > only have an effect if i is a join (I think each.i= fits slightly better > for this description than .join=) - this will replicate current underlying > behavior. If you think the cross-apply is something that could work not > just for i being a data-table but other things as well, then it would make > perfect sense to implement that action too when the bool is true. > > On Apr 30, 2013, at 2:58 AM, Arunkumar Srinivasan <[email protected]> > wrote: > > (The earlier message was too long and was rejected.) > So, from the discussion so far, I see that Matthew is nice enough to > implement `.JOIN` or `cross.apply`. I've a couple of questions. Suppose, > > DT1 <- data.table(x=c(1,1,2,3,3), y=1:5, z=6:10) > setkey(DT1, "x") > DT2 <- data.table(x=1) > DT1[DT2, y, .JOIN=TRUE] # I guess the syntax is something like this. I > expect here the same output as current DT1[DT2, y] > > The above syntax seems "okay". But my first question is what is > `.JOIN=FALSE` supposed to do under these two circumstances? Suppose, > > DT1 <- data.table(x=c(1,1,2,3,3), y=1:5, z=6:10) > setkey(DT1, "x") > DT2 <- data.table(x=c(1,2,1), w=c(11:13)) > # what's the output supposed to be for? > DT1[DT2, y, .JOIN=FALSE] > DT1[DT2, .JOIN = FALSE] > > Depending on this I'd have to think about `drop = TRUE/FALSE`. Also, how > does it work with `subset`? > > DT1[x %in% c(1,2,1), y, .JOIN=TRUE] # .JOIN is ignored? > Is this supposed to also do a "cross-apply" on the logical subset? I > guess not. So, .JOIN is an "extra" parameter that comes into play *only* > when `i` is a `data.table`? > > I'd love to have some replies to these questions for me to take a stance > on `.JOIN`. Thank you. > > Best, > Arun. > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
