Interesting. Well asked.
On my netbook : > Rprof() > system.time(do.call(cbind, lst.USArrests.dt)) user system elapsed 4.008 0.000 4.012 > Rprof(NULL) > summaryRprof() $by.self self.time self.pct total.time total.pct "make.names" 1.82 44.39 1.82 44.39 "data.table" 1.74 42.44 4.00 97.56 "[[.data.frame" 0.12 2.93 0.26 6.34 "gc" 0.10 2.44 0.10 2.44 "match" 0.08 1.95 0.10 2.44 "length" 0.06 1.46 0.06 1.46 "[[" 0.04 0.98 0.30 7.32 "%in%" 0.04 0.98 0.14 3.41 "NROW" 0.02 0.49 0.12 2.93 "is.data.frame" 0.02 0.49 0.02 0.49 "names" 0.02 0.49 0.02 0.49 "paste" 0.02 0.49 0.02 0.49 "sys.call" 0.02 0.49 0.02 0.49 So almost half of it is in make.names() [notice that cbind.data.frame calls data.frame with check.names=FALSE] and the other half in data.table() but not sure exactly where. So we can do better, or maybe we need a cbindlist (analogous to the existing rbindlist). But as you allude, we've spent most effort on := and set() to add columns by reference rather than copying using cbind(). I've added a feature request to tackle this anyway. Thanks for highlighting, great test. https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2636 Matthew On 22.03.2013 22:23, Sadao Milberg wrote: > I've recently discovered the dramatic performance improvements data.table provides over ddply() and merge(), and I'm looking forward to integrating it into my work. While messing around with benchmarks, I ran into an unexpected outcome with cbind(), where operations are actually much faster with data frames than data tables. Don't ask my why I'd ever do the following, but I am curious as to why it is so much slower: > > USArrests.dt > lst.USArrests > lst.USArrests.dt > > microbenchmark(do.call(cbind, lst.USArrests), > do.call(cbind, lst.USArrests.dt), > times=10) > > Unit: milliseconds > expr min lq median uq max neval > do.call(cbind, lst.USArrests) 42.26891 47.70086 48.71271 49.88542 51.25453 10 > do.call(cbind, lst.USArrests.dt) 750.70469 761.70511 773.91232 816.85707 880.45896 10 > > This is run on an Ubuntu system.
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
