Re: [datatable-help] datatable-help Digest, Vol 28, Issue 2

Yike Lu Thu, 07 Jun 2012 11:26:25 -0700


On 6/7/2012 1:56 PM, Matthew Dowle wrote:


To clear up (hopefully) these points for completeness ...

So it seems both our mental models are somewhat wrong - for me, Y isn't
necessarily the larger table, for you Y isn't necessarily the subset.


Perhaps it helps to recall what inspired data.table in the first place:
A[B] syntax in base.  See FAQ 2.14.  Rightly or wrongly, that's really how
I think about X[Y]; i.e., just like matrix A[B] but extended to more
dimensions each of potentially different type.

This makes sense - I don't often use matrices directly and my backgroundfor this type of data analysis work relies heavily on kdb's SQL dialect,so I find it harder to think about it as a generalization of matrix A[B].

The reason I like thinking in joins is I prefer infix syntax. More
elegant to me to say setkey(Y, a, b) %lj% setkey(X, a, b) than setkey(X,
a, b)[setkey(Y, a, b)], and the chaining versus nested is easy as well...

X[Y][Z] =>  (X %rj% Y) %rj% Z
X[Y[Z]] =>  X %rj% (Y %rj% Z)


Interesting. Have you defined these infix aliases yourself and that's how
you use data.table then? How to control the infix op by passing arguments
such as nomatch=0|NA, roll, mult?

Yes, this is how I use the joins. The infix notation is more attractiveto me than the accessor [ ] one. The accessor unfortunately oftenresults in heavy nesting. Data.table already uses lots of nesting in thefirst place: DT[, list( x = f(a,b,c))]


I defined nomatch=0 as %ij%, which is commutative.

For roll I use %aj% (asof join). I haven't used this very much, though.I may define a not-infix version aj(DT1, DT2, keys) or something similar.

For mult, I will actually go and hand code the join. But for my usecases, it's often fast enough to do something dumb like (X[,list(x=x[1])] %ij% Y) or the like. The thing is, I have a lot of mixeddata, so I end up having to put all the columns in the select anyways,even if I do use mult=first. Also, to do that I would have to use theidiom setkey(DT[, list(k1, k2, v1, v2, v3), k1, k2)[J(k1, k2), mult=first]


Actually come to think of it, that isn't so bad.

And I do see where you're coming from as far as JIS and using theadditional arguments. It would be much more efficient and result insubstantial speed gains.

I may start making more of an effort to use data.table this way. And Imay rewrite %lj% / %rj% / %ij% to parse and use JIS.

I was wondering what the differences/similarities there were to merge().


See FAQ 1.12: "What is the dierence between X[Y] and merge(X,Y)?"

I see, then in my current usage, there isn't that big of a difference,as I'm being very inefficient with my infix ops anyways.

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] datatable-help Digest, Vol 28, Issue 2

Reply via email to