Re: [datatable-help] Response to dplyr baseball vignette benchmarks

Arunkumar Srinivasan Wed, 22 Jan 2014 12:21:59 -0800

Chris,

You're 100% right. That's what we've conversed with Hadley as well. For this 
data, we decided to stick to this, as we weren't lagging behind "dplyr".
This is also why I made the point that "However, when benchmarking one should 
be benchmarking the equivalent of an operation in each tool, not how one thinks 
the design should be."
This is so that the next time we benchmark, we can do it the data.table way and 
dplyr way and not dplyr's data.table way.

Arun
From: Chris Neff Chris Neff
Reply: Chris Neff [email protected]
Date: January 22, 2014 at 9:17:49 PM
To: Arunkumar Srinivasan [email protected]
Subject:  Re: [datatable-help] Response to dplyr baseball vignette benchmarks  
When you do use larger data sets where it will matter, I think more strongly 
highlighting the in-place vs. copying differences will be key. There is also 
the notion that yes, you should compare things as closely as possible when just 
doing standard benchmarking, but I think this is selling data.table a bit short 
by mimicking dplyr with copying.  You show this a bit in the mutate example, 
but even in the arrange example the copy is slowing things down.  It is so 
small that it wouldn't really make a ton of difference in this case, but with 
10m rows the copying gets to be a large noticeable difference between 
data.table and standard data.frame methods like setnames vs names<-

On Wed, Jan 22, 2014 at 3:09 PM, Arunkumar Srinivasan <[email protected]> 
wrote:
Chris,

Thanks. Yes that's the plan (the last line in the link). Once the next version 
of data.table is out on CRAN, the benchmarks should come out.

Arun
From: Chris Neff Chris Neff
Reply: Chris Neff [email protected]
Date: January 22, 2014 at 9:07:34 PM
To: Arunkumar Srinivasan [email protected]
Subject:  Re: [datatable-help] Response to dplyr baseball vignette benchmarks
Thank you for responding to this so fast to get out ahead of the misleading 
aspects.

As another comparison, it would definitely be constructive to also use a data 
set that is larger than 10 MB.  Something in the 1m+ row range perhaps.

On Wed, Jan 22, 2014 at 2:54 PM, Arunkumar Srinivasan <[email protected]> 
wrote:
Hello,

Matthew and I have redone the benchmarks and posted a response to the dplyr's 
baseball vignette benchmark here: 
http://arunsrinivasan.github.io/dplyr_benchmark/

Have a look and let us know what you think!

Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Response to dplyr baseball vignette benchmarks

Reply via email to