Re: [datatable-help] datatable roll="next" takes 150 times longer than findInterval

Arunkumar Srinivasan Thu, 06 Feb 2014 05:54:41 -0800

Not really. Because it still doing a "by". Meaning, for every grouping in
"by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
1e5 calls. And that can be expensive depending on the function + the time
to call eval from within C.


However, since it's not necessary to do a by-without-by, we can perform the
join and then compute once the difference between columns. There's no
grouping, no eval from C, and no multiple calls to abs. Hope this clears it
up?


On Thu, Feb 6, 2014 at 2:45 PM, Gabor Grothendieck
<[email protected]>wrote:

> On Thu, Feb 6, 2014 at 8:23 AM, Arunkumar Srinivasan
> <[email protected]> wrote:
> > In this case? Then nothing'll be different.
> >
> > I'm not sure what you mean because the problem here is that this
> *doesn't*
> > require *by-without-by* as the j-operations are not necessary to be
> > performed *during* the join. So, we can just perform the join and then
> take
> > the "abs" once at the end, rather than calling it about 1e5+ times (the
> > number of groups).
> >
> > So, if your question is: "apart from this question, how would an explicit
> > by-without-by look like?", then I guess it'd be the same as the normal
> > aggregation, but "by" would take a data.table as well. This has not yet
> been
> > discussed or conceptualised. But this is how I imagine it to be:
> >
> > DT1[, list(...), by=DT2]
> >
> > Where, DT1's key columns have to be set as usual.
>
> My original code was this:
>
> dtx <- data.table(x = which(x))
> dty <- data.table(y = which(y), key = "y")
> dty[dtx, abs(x - y), roll = "nearest"]
>
> With that feature would this code not use by-within-by and therefore
> become fast?
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] datatable roll="next" takes 150 times longer than findInterval

Reply via email to