Sorry the proposed result was a wrong paste in the last message:
# proposed way and the result:
DT1[DT2, sum(y), .join = FALSE]
[1] 6 9 6
And the last part that it *should* be a data.table is quite obvious then.
Arun
On Thursday, May 2, 2013 at 12:16 AM, Arunkumar Srinivasan wrote:
> Eduard,
>
> Great. That explains me the difference between `drop` and `.join` here.
> Even though I don't *need* this feature (I can't recall the last time when I
> use a `data.table` for `i` and had to reduce the function, say, sum). But, I
> think it can only better the usage.
>
> However, there's one point *I think* would still disagree with @eddi here,
> not sure.
>
> DT1 <- data.table(x=c(1,1,1,2,2), y=1:5)
> DT2 <- data.table(x=c(1,2,1))
> setkey(DT1, "x")
>
> # proposed way and the result:
> DT1[DT2, sum(y), .join = FALSE]
> [1] 21
>
>
> So far nice. However, the operation `DT1[DT2, sum(y), .join = TRUE]` *should*
> result in a `data.table` output as follows (it's even more clearer now that
> .join is set to TRUE, meaning it's a data.table join):
>
> x V1
> 1: 1 6
> 2: 2 9
> 3: 1 6
>
>
> Basically, `.join = TRUE` is the current functionality unchanged and nice to
> be default (as Matthew hinted).
>
> Arun
>
>
> On Tuesday, April 30, 2013 at 5:03 PM, Eduard Antonyan wrote:
>
> > Arun,
> >
> > Yes, DT1[DT2, y, .JOIN = FALSE] would do the same as DT1[DT2][, y] does
> > currently.
> > No, DT1[DT2, y, .JOIN=FALSE], will NOT do a by-without-by, which is
> > literally a 'by' by each of the rows of DT2 that are in the join (thus
> > each.i! - the operation 'y' will be performed for each of the rows of 'i'
> > and then combined and returned). There is no efficiency issue here that I
> > can see, but Matthew can correct me on this. As far as I understand the
> > efficiency comes into play when e.g. the rows of 'i' are unique, and after
> > the join you'd like to do a 'by' by those, then DT1[DT2][, j, by =
> > key(DT1)] would be less efficient since the 'by' could've already been done
> > while joining.
> >
> > DT1[DT2, .JOIN=FALSE] would be equivalent to both current and future
> > DT1[DT2] - in this expression there is no by-without-by happening in either
> > case.
> >
> > The purpose of this is NOT for j just being a column or an expression that
> > gets evaluated into a signal column. It applies to any j. The extra
> > 'by-without-by' column is currently output independently of how many
> > columns you output in your j-expression, the behavior is very similar as to
> > when you specify a by=., except that the 'by' happens by a very special
> > expression, that only exists when joining two data-tables and that
> > generally doesn't exist before or after the join.
> >
> > Hope this answers your questions.
> >
> >
> > On Tue, Apr 30, 2013 at 8:48 AM, Arunkumar Srinivasan
> > <[email protected] (mailto:[email protected])> wrote:
> > > Eduard, thanks for your reply. But somethings are unclear to me still.
> > > I'll try to explain them below.
> > >
> > > First I prefer .JOIN (or cross.apply) just because `each.i` seems general
> > > (that it is applicable to *every* i operation, which as of now seems
> > > untrue). .JOIN is specific to data.table type for `i`.
> > >
> > > From what I understand from your reply, if (.JOIN = FALSE), then,
> > >
> > > DT1[DT2, y, .JOIN = FALSE] <=> DT1[DT2][, y]
> > >
> > > Is this right? It's a bit confusing because I think you're okay with
> > > "by-without-by" and I got the impression from Sadao that he finds the
> > > syntax of "by-without-by" unaccessible/advanced for basic users. So, just
> > > to clarify, here the DT1[DT2, y, .JOIN=FALSE] will still do the
> > > "by-without-by" and then result in a "vector", right?
> > >
> > > Matthew explains in the current documentation that DT1[DT2][, y] would
> > > "join" all columns of DT1 and DT2 and then subset. I assume the
> > > implementation underneath is *not* DT1[DT2][, y] rather the result is an
> > > efficient equivalence. Then, that of course seems alright to me.
> > >
> > > If what I've told so far is right, then the syntax `DT1[DT2,
> > > .JOIN=FALSE]` doesn't make sense/has no purpose to me. At least I can't
> > > think of any at the moment.
> > >
> > > To conclude, IMHO, if the purpose of `.JOIN` is to provide the same as
> > > DT1[i, j] for DT1[DT2, j] (j being a column or an expression that results
> > > in getting evaluated as a scalar for every group in the current
> > > by-without-by syntax), then, I find this is covered in `drop =
> > > TRUE/FALSE`. Correct me if I am wrong. But, one could do: `DT1[DT2, j,
> > > drop=TRUE]` instead of `DT1[DT2, j, .JOIN=FALSE]` and DT1[i, j,
> > > drop=FALSE] instead of DT1[i, list(x,y)].
> > >
> > > If you/anyone believes it's wrong, I'd be all ears to clarify as to
> > > what's the purpose of `drop` then (and also how it *doesn't* suit here as
> > > compared to .JOIN).
> > >
> > > Arun
> > >
> > >
> > > On Tuesday, April 30, 2013 at 2:54 PM, Eduard Antonyan wrote:
> > >
> > > > Arun,
> > > >
> > > > If the new boolean is false, the result would be the same as without it
> > > > and would be equal to current behavior of d[i][, j]. If it's true, it
> > > > will only have an effect if i is a join (I think each.i= fits slightly
> > > > better for this description than .join=) - this will replicate current
> > > > underlying behavior. If you think the cross-apply is something that
> > > > could work not just for i being a data-table but other things as well,
> > > > then it would make perfect sense to implement that action too when the
> > > > bool is true.
> > > >
> > > > On Apr 30, 2013, at 2:58 AM, Arunkumar Srinivasan
> > > > <[email protected] (mailto:[email protected])> wrote:
> > > >
> > > > > (The earlier message was too long and was rejected.)
> > > > > So, from the discussion so far, I see that Matthew is nice enough to
> > > > > implement `.JOIN` or `cross.apply`. I've a couple of questions.
> > > > > Suppose,
> > > > >
> > > > > DT1 <- data.table(x=c(1,1,2,3,3), y=1:5, z=6:10)
> > > > > setkey(DT1, "x")
> > > > > DT2 <- data.table(x=1)
> > > > > DT1[DT2, y, .JOIN=TRUE] # I guess the syntax is something like
> > > > > this. I expect here the same output as current DT1[DT2, y]
> > > > >
> > > > > The above syntax seems "okay". But my first question is what is
> > > > > `.JOIN=FALSE` supposed to do under these two circumstances? Suppose,
> > > > >
> > > > > DT1 <- data.table(x=c(1,1,2,3,3), y=1:5, z=6:10)
> > > > > setkey(DT1, "x")
> > > > > DT2 <- data.table(x=c(1,2,1), w=c(11:13))
> > > > > # what's the output supposed to be for?
> > > > > DT1[DT2, y, .JOIN=FALSE]
> > > > > DT1[DT2, .JOIN = FALSE]
> > > > >
> > > > > Depending on this I'd have to think about `drop = TRUE/FALSE`. Also,
> > > > > how does it work with `subset`?
> > > > >
> > > > > DT1[x %in% c(1,2,1), y, .JOIN=TRUE] # .JOIN is ignored?
> > > > >
> > > > > Is this supposed to also do a "cross-apply" on the logical subset? I
> > > > > guess not. So, .JOIN is an "extra" parameter that comes into play
> > > > > *only* when `i` is a `data.table`?
> > > > >
> > > > > I'd love to have some replies to these questions for me to take a
> > > > > stance on `.JOIN`. Thank you.
> > > > >
> > > > > Best,
> > > > > Arun.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
>
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help