Gabor,
With that said about it being a micro benchmark, by-without-by might be
at play in GG2(X,Y) here; i.e. running j for each row of i, where it
could run once. I remember you and others quite rightly said
by-without-by should be explicit ... still got to make that change. A
similar speed issue came up recently somewhere else as well which the
change in default should help.
Matt
On 02/02/14 18:57, Matt Dowle wrote:
But this is at the *micro* second level ?!!
I confirm those results on my slow netbook but remember these are
**micro** seconds i.e. 71,000 here is less than 0.1 of a second.
> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
Unit: microseconds
expr min lq median uq max neval
flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 100
GG1(X, Y) 14287.380 14370.038 14466.5990 16010.5440 121082.77 100
GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 100
To put it in some perspective :
> system.time(GG2(X,Y))
user system elapsed
0.072 0.000 0.072
> system.time(GG2(X,Y))
user system elapsed
0.080 0.000 0.079
> system.time(GG2(X,Y))
user system elapsed
0.072 0.000 0.072
Where those times are in seconds. So the task in question here,
takes 0.07 seconds ?!
The 150x longer figure is actually (using figures from the S.O.
answer) 24695 microseconds (i.e. 0.024 seconds) divided by 168
microseconds (0.000168 seconds). 0.024 seconds / 0.000168 = "150
times". If you rounded to milliseconds you could say data.table is
infinitely slower (24ms / 0ms = Inf).
I can believe there's scope for improvement, sure, but not from this
benchmark. The vectors need to be *much* bigger and replications needs
to be *much* smaller, say 3. The task being timed needs to take a
meaningful amount of time (say 5 seconds) *for a single run*.
Matt
On 02/02/14 12:27, Gabor Grothendieck wrote:
The benchmark at the bottom of this post shows a problem where a
data.table roll="next" took nearly 150x longer than a base
findInterval() solution. (The data.table solution is easier to write
though.) This suggests an area for possible speed improvement.
http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com <http://gmail.com>
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help