Hi,
I don't use SJ very much admittedly. ?SJ says it's for :
DT[SJ(...)]
where ... has :
"Each argument is a vector. Generally each vector is the same length
but if they are not then usual silent repitition is applied."
So it's not really for :
X[ SJ(Y) ]
since
X[Y]
is already that. Or maybe other ways I use sometimes :
X[setkey(Y)]
or
X[setkey(Y,...)]
or
X[setkey(copy(Y),...)]
So SJ() is more for constructing a data.table from vectors, in the
spirit of J() originally being a mere alias for data.table().
Let's say you have randomly ordered ids in vector 'ids' and X is keyed
by id.
X[J(ids)] # look up data and return it in the same order as ids is
ordered (each lookup is a new binary search)
X[SJ(ids)] # sort ids first, binary merge (bit faster if i is keyed
too), and return data in sorted order, keyed by id too
That's the idea anyway. Sometimes if I'm not sure the input vector is
sorted or not, I'll use SJ() just to make sure. There may be a shortcut
in there that uses is.unsorted first to save the cost of sorting (and if
not there probably should be).
X must be keyed. Y having a key is optional, but if Y has a key too it
will take advantage of it. Obviously speed differences will depend on
many factors including the number of rows in Y, the number of columns in
the join, the number of rows in X and the number of rows in the result.
And there is a known potential performance improvement in this area
(i.e. when both X and Y are keyed), although quite a bit was done
already last year in particular for character vector joins. [Types make
a large difference in benchmarks.]
Matthew
On 30.06.2013 15:04, Gabor Grothendieck wrote:
Consider SJ which I assume was intended to be used like this
X[ SJ(Y) ]
where X and Y are two data tables. What is the point of SJ? It
seems
similar to J except it also adds a key to its argument; however, is
it
not the case that that the key on Y will not be used since it has to
do a full scan of Y anyways?
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help