Have edited here now: http://stackoverflow.com/a/21500855/559784
On Wed, Feb 5, 2014 at 4:42 PM, Arunkumar Srinivasan <[email protected]>wrote: > Seems like the "by-without-by" is what's slowing things down: > > require(data.table) > dtx <- data.table(x=which(X), key="x") > dty <- data.table(y=which(Y), key="y") > dtx[, x1 := x] > dty[, y1 := y] > system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)]) > user system elapsed > 1.321 0.076 1.396 > system.time(ans2 <- flodel(x,y)) > user system elapsed > 0.936 0.044 0.977 > > identical(ans, ans2) # [1] TRUE > > > On Wed, Feb 5, 2014 at 4:32 PM, Arunkumar Srinivasan < > [email protected]> wrote: > >> Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to >> Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE. >> Will see where's the delay coming from. >> >> >> On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck < >> [email protected]> wrote: >> >>> There was anoither benchmark posted with larger data and longer times >>> but this time data.table stopped with an error. See: >>> >>> >>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855 >>> >>> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <[email protected]> >>> wrote: >>> > Gabor, >>> > >>> > With that said about it being a micro benchmark, by-without-by might >>> be at >>> > play in GG2(X,Y) here; i.e. running j for each row of i, where it >>> could run >>> > once. I remember you and others quite rightly said by-without-by >>> should be >>> > explicit ... still got to make that change. A similar speed issue >>> came up >>> > recently somewhere else as well which the change in default should >>> help. >>> > >>> > Matt >>> > >>> > >>> > On 02/02/14 18:57, Matt Dowle wrote: >>> > >>> > >>> > But this is at the *micro* second level ?!! >>> > >>> > I confirm those results on my slow netbook but remember these are >>> **micro** >>> > seconds i.e. 71,000 here is less than 0.1 of a second. >>> > >>> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y)) >>> > Unit: microseconds >>> > expr min lq median uq max >>> neval >>> > flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 >>> 100 >>> > GG1(X, Y) 14287.380 14370.038 14466.5990 16010.5440 121082.77 >>> 100 >>> > GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 >>> 100 >>> > >>> > To put it in some perspective : >>> > >>> >> system.time(GG2(X,Y)) >>> > user system elapsed >>> > 0.072 0.000 0.072 >>> >> system.time(GG2(X,Y)) >>> > user system elapsed >>> > 0.080 0.000 0.079 >>> >> system.time(GG2(X,Y)) >>> > user system elapsed >>> > 0.072 0.000 0.072 >>> > >>> > Where those times are in seconds. So the task in question here, >>> takes >>> > 0.07 seconds ?! >>> > >>> > The 150x longer figure is actually (using figures from the S.O. answer) >>> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds >>> > (0.000168 seconds). 0.024 seconds / 0.000168 = "150 times". If you >>> > rounded to milliseconds you could say data.table is infinitely slower >>> (24ms >>> > / 0ms = Inf). >>> > >>> > I can believe there's scope for improvement, sure, but not from this >>> > benchmark. The vectors need to be *much* bigger and replications needs >>> to be >>> > *much* smaller, say 3. The task being timed needs to take a >>> meaningful >>> > amount of time (say 5 seconds) *for a single run*. >>> > >>> > Matt >>> > >>> > >>> > On 02/02/14 12:27, Gabor Grothendieck wrote: >>> > >>> > The benchmark at the bottom of this post shows a problem where a >>> data.table >>> > roll="next" took nearly 150x longer than a base findInterval() >>> solution. >>> > (The data.table solution is easier to write though.) This suggests an >>> area >>> > for possible speed improvement. >>> > >>> > >>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855 >>> > >>> > -- >>> > Statistics & Software Consulting >>> > GKX Group, GKX Associates Inc. >>> > tel: 1-877-GKX-GROUP >>> > email: ggrothendieck at gmail.com >>> > >>> > >>> > _______________________________________________ >>> > datatable-help mailing list >>> > [email protected] >>> > >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Statistics & Software Consulting >>> GKX Group, GKX Associates Inc. >>> tel: 1-877-GKX-GROUP >>> email: ggrothendieck at gmail.com >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
