Gabor,

With that said about it being a micro benchmark, by-without-by might be at play in GG2(X,Y) here; i.e. running j for each row of i, where it could run once. I remember you and others quite rightly said by-without-by should be explicit ... still got to make that change. A similar speed issue came up recently somewhere else as well which the change in default should help.

Matt

On 02/02/14 18:57, Matt Dowle wrote:

But this is at the *micro* second level ?!!

I confirm those results on my slow netbook but remember these are **micro** seconds i.e. 71,000 here is less than 0.1 of a second.

> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
Unit: microseconds
         expr       min        lq      median          uq max neval
 flodel(X, Y)   330.798   369.369    402.7935    455.3225 17996.26   100
    GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
    GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100

To put it in some perspective :

> system.time(GG2(X,Y))
   user  system elapsed
  0.072   0.000   0.072
> system.time(GG2(X,Y))
   user  system elapsed
  0.080   0.000   0.079
> system.time(GG2(X,Y))
   user  system elapsed
  0.072   0.000   0.072

Where those times are in seconds. So the task in question here, takes 0.07 seconds ?!

The 150x longer figure is actually (using figures from the S.O. answer) 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds (0.000168 seconds). 0.024 seconds / 0.000168 = "150 times". If you rounded to milliseconds you could say data.table is infinitely slower (24ms / 0ms = Inf).

I can believe there's scope for improvement, sure, but not from this benchmark. The vectors need to be *much* bigger and replications needs to be *much* smaller, say 3. The task being timed needs to take a meaningful amount of time (say 5 seconds) *for a single run*.

Matt


On 02/02/14 12:27, Gabor Grothendieck wrote:
The benchmark at the bottom of this post shows a problem where a data.table roll="next" took nearly 150x longer than a base findInterval() solution. (The data.table solution is easier to write though.) This suggests an area for possible speed improvement.

http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com <http://gmail.com>


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to