Hi,

I don't use SJ very much admittedly.  ?SJ says it's for :
   DT[SJ(...)]
where ... has :
"Each argument is a vector. Generally each vector is the same length but if they are not then usual silent repitition is applied."
So it's not really for :
   X[ SJ(Y) ]
since
   X[Y]
is already that. Or maybe other ways I use sometimes :
   X[setkey(Y)]
or
   X[setkey(Y,...)]
or
   X[setkey(copy(Y),...)]

So SJ() is more for constructing a data.table from vectors, in the spirit of J() originally being a mere alias for data.table().

Let's say you have randomly ordered ids in vector 'ids' and X is keyed by id.

X[J(ids)] # look up data and return it in the same order as ids is ordered (each lookup is a new binary search) X[SJ(ids)] # sort ids first, binary merge (bit faster if i is keyed too), and return data in sorted order, keyed by id too

That's the idea anyway. Sometimes if I'm not sure the input vector is sorted or not, I'll use SJ() just to make sure. There may be a shortcut in there that uses is.unsorted first to save the cost of sorting (and if not there probably should be).

X must be keyed. Y having a key is optional, but if Y has a key too it will take advantage of it. Obviously speed differences will depend on many factors including the number of rows in Y, the number of columns in the join, the number of rows in X and the number of rows in the result. And there is a known potential performance improvement in this area (i.e. when both X and Y are keyed), although quite a bit was done already last year in particular for character vector joins. [Types make a large difference in benchmarks.]

Matthew


On 30.06.2013 15:04, Gabor Grothendieck wrote:
Consider SJ which I assume was intended to be used like this
   X[ SJ(Y) ]
where X and Y are two data tables. What is the point of SJ? It seems similar to J except it also adds a key to its argument; however, is it
not the case that that the key on Y will not be used since it has to
do a full scan of Y anyways?

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
datatable-help mailing list
[email protected]

https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to