Hi, On Wed, Apr 16, 2014 at 9:41 AM, Arunkumar Srinivasan <[email protected]> wrote: > Clayton, > > Thanks for posting it here. Here's the first follow-up. Here's an example: > > require(data.table) ## 1.9.3 comm 1263 > dt <- data.table(x=1:1e7, y=1:1e7) > > ## data.table optimisation removes names > system.time(ans1 <- dt[, list(z=y), by=x]) > > # user system elapsed > # 7.193 0.275 7.859 > > ## data.table can't optimise to remove names > foo <- function(x) list(z=x) > system.time(ans2 <- dt[, foo(y), by=x]) > # user system elapsed > # 16.020 0.179 16.411 > >> identical(ans1, ans2) > [1] TRUE > > This is without checking for names, for each of the 1e7 groups.
Do you think the ~2x difference in speed is really a result of an optimization based on the "names" thing, or is it due to the mechanics required to invoke a function within each grouping of the second example? -steve -- Steve Lianoglou Computational Biologist Genentech _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
