Re: [R] strata -- really slow performance

hadley wickham Sun, 12 Jul 2009 22:01:15 -0700

> In this simple example, it took less than half a second to generate the
> result. That is on a 2.93 Ghz MacBook Pro.
>
>
> So, for your data, the code would look something like this:
>
>
> system.time(DF.new <- do.call(rbind,
>                              lapply(split(patch_summary,
> patch_summary$UniqueID),
>                                     function(x) x[sample(nrow(x), 1), ])))


For large data, you can make it even faster with

sample_rows <- function(df, n) {
  df[sample(nrow(df), n), ]
}

library(plyr)
system.time(DF.new <- ddply(DF, "ID", sample_rows, n = 1))

ddply uses some tricks to avoid copying DF which really make a
different for large data (unfortunately it also increases the overhead
so it is currently slower for small data)

Hadley


-- 
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strata -- really slow performance

Reply via email to