There was anoither benchmark posted with larger data and longer times but this time data.table stopped with an error. See:
http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855 On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <[email protected]> wrote: > Gabor, > > With that said about it being a micro benchmark, by-without-by might be at > play in GG2(X,Y) here; i.e. running j for each row of i, where it could run > once. I remember you and others quite rightly said by-without-by should be > explicit ... still got to make that change. A similar speed issue came up > recently somewhere else as well which the change in default should help. > > Matt > > > On 02/02/14 18:57, Matt Dowle wrote: > > > But this is at the *micro* second level ?!! > > I confirm those results on my slow netbook but remember these are **micro** > seconds i.e. 71,000 here is less than 0.1 of a second. > >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y)) > Unit: microseconds > expr min lq median uq max neval > flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 100 > GG1(X, Y) 14287.380 14370.038 14466.5990 16010.5440 121082.77 100 > GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 100 > > To put it in some perspective : > >> system.time(GG2(X,Y)) > user system elapsed > 0.072 0.000 0.072 >> system.time(GG2(X,Y)) > user system elapsed > 0.080 0.000 0.079 >> system.time(GG2(X,Y)) > user system elapsed > 0.072 0.000 0.072 > > Where those times are in seconds. So the task in question here, takes > 0.07 seconds ?! > > The 150x longer figure is actually (using figures from the S.O. answer) > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds > (0.000168 seconds). 0.024 seconds / 0.000168 = "150 times". If you > rounded to milliseconds you could say data.table is infinitely slower (24ms > / 0ms = Inf). > > I can believe there's scope for improvement, sure, but not from this > benchmark. The vectors need to be *much* bigger and replications needs to be > *much* smaller, say 3. The task being timed needs to take a meaningful > amount of time (say 5 seconds) *for a single run*. > > Matt > > > On 02/02/14 12:27, Gabor Grothendieck wrote: > > The benchmark at the bottom of this post shows a problem where a data.table > roll="next" took nearly 150x longer than a base findInterval() solution. > (The data.table solution is easier to write though.) This suggests an area > for possible speed improvement. > > http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855 > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
