wow ! You just saved me hours of computation. Now I can get it of all my ddply ! Many thanks !
May I ask for something else: in your function you use the notation d[["y"]]. I tried to use d[ , "y" ] instead of it and got an error message "Non-numeric argument to mathematical function". However if I use one or the other notation in sqrt directly on the command line it works. So in that specific case, what's the difference in using d[["y"]] in place of d[, "y"] Many thanks again for your help. Best, David On Thu, Jan 17, 2013 at 4:53 PM, Akhil Behl <[email protected]> wrote: > If I am not wrong, you are looking for `.SD'. In fact you can put in > the exact function you were throwing at ddply earlier. There are other > special names like .SD that you can find in the data.table FAQs. > > Let's see: > R> require(plyr) > Loading required package: plyr > > R> require(data.table) > Loading required package: data.table > data.table 1.8.7 For help type: help("data.table") > > R> x.df <- data.frame(x=letters[1:2], y=1:10) > R> x.dt <- data.table(x.df) > R> > R> my.func <- function (d) { # Define a function on the subset > + sum(sqrt(d[["y"]])) > + } > R> > R> # The plyr way: > R> ddply(x.df, "x", my.func) -> ans.plyr > R> > R> # The data.table way: > R> x.dt[ , my.func(.SD), by=x] -> ans.dt > R> > R> ans.plyr > x V1 > 1 a 10.61387 > 2 b 11.85441 > > R> ans.dt > x V1 > 1: a 10.61387 > 2: b 11.85441 > > For more help, try this on an R prompt: > > R> vignette('datatable-faq') > > -- > ASB. > > On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <[email protected]> > wrote: > > Hi, > > > > I've been looking all around the web without a clear answer to this > trivial > > problem. I'm sure I'm not looking where I should: > > > > in fact, I want to replace my use of ddply from the plyr package by > > data.table. One of my main use is to group a big data.frame by a group of > > variable and do something on this sub data.frame: > > > > ddply( my_df, my_grouping_var, function (d) { do something with d } ) > > ----> d is a data.frame again > > > > and it's slow on big data.frame. > > > > > > However, I don't really understand how to redo the same thing with a > > data.table. Basically if "j" in a data.table is equivalent to the select > > clause in SQL, then how do I do SELECT * FROM etc... > > > > I want to be able to pass a function like in ddply that will receive not > > only a few columns but the full subset that is selected by the "by" > clause. > > > > Thanks... > > Best, > > David > > > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
