When you do use larger data sets where it will matter, I think more strongly highlighting the in-place vs. copying differences will be key. There is also the notion that yes, you should compare things as closely as possible when just doing standard benchmarking, but I think this is selling data.table a bit short by mimicking dplyr with copying. You show this a bit in the mutate example, but even in the arrange example the copy is slowing things down. It is so small that it wouldn't really make a ton of difference in this case, but with 10m rows the copying gets to be a large noticeable difference between data.table and standard data.frame methods like setnames vs names<-
On Wed, Jan 22, 2014 at 3:09 PM, Arunkumar Srinivasan <[email protected] > wrote: > Chris, > > Thanks. Yes that's the plan (the last line in the link). Once the next > version of data.table is out on CRAN, the benchmarks should come out. > > Arun > ------------------------------ > From: Chris Neff Chris Neff <[email protected]> > Reply: Chris Neff [email protected] > Date: January 22, 2014 at 9:07:34 PM > To: Arunkumar Srinivasan [email protected] > Subject: Re: [datatable-help] Response to dplyr baseball vignette > benchmarks > > Thank you for responding to this so fast to get out ahead of the > misleading aspects. > > As another comparison, it would definitely be constructive to also use a > data set that is larger than 10 MB. Something in the 1m+ row range perhaps. > > > On Wed, Jan 22, 2014 at 2:54 PM, Arunkumar Srinivasan < > [email protected]> wrote: > >> Hello, >> >> Matthew and I have redone the benchmarks and posted a response to the >> dplyr's >> baseball vignette benchmark here: >> http://arunsrinivasan.github.io/dplyr_benchmark/ >> >> Have a look and let us know what you think! >> >> Arun >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
