Re: [datatable-help] datatable roll="next" takes 150 times longer than findInterval

Arunkumar Srinivasan Thu, 06 Feb 2014 06:59:24 -0800

Gabor,

I think now I understand what your earlier post was about. You mean after
the external by-without-by, doing DT1[DT2, ..., ] will be faster as it
shouldn't do a by-without-by. Yes, that's true. So basically, the statement:


dty[dtx, abs(x - y), roll = "nearest"]

once external by-without-by is implemented, will/should first do the join
and then do the "j' operation. And therefore it'll be as fast as the
solution I wrote. If one wants to perform the j-operation for each group,
then they'll have to do something like

DT1[, j, by=DT2] (or any other solutions we end up on)

Sorry for the misunderstanding.


On Thu, Feb 6, 2014 at 3:20 PM, Gabor Grothendieck
<[email protected]>wrote:

> On Thu, Feb 6, 2014 at 8:53 AM, Arunkumar Srinivasan
> <[email protected]> wrote:
> > Not really. Because it still doing a "by". Meaning, for every grouping in
> > "by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
> 1e5
> > calls. And that can be expensive depending on the function + the time to
> > call eval from within C.
> >
> > However, since it's not necessary to do a by-without-by, we can perform
> the
> > join and then compute once the difference between columns. There's no
> > grouping, no eval from C, and no multiple calls to abs. Hope this clears
> it
> > up?
> >
> >
>
> In that case what is the proposed user interface?
>
> I thought that the idea was that one would have to explicitly specify
> the by= clause for by-within-by  it to occur.  In the code I had just
> posted there is a join = "nearest" but no by= clause is specified.
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] datatable roll="next" takes 150 times longer than findInterval

Reply via email to