indeed, it makes sense now, as what is passed to the function is indeed a data.table and not a data.frame.
Thanks guys for your help. Now I'm a convinced data.table user. Best, David On Thu, Jan 17, 2013 at 5:25 PM, Akhil Behl <[email protected]> wrote: > Hey David, > > I thought your problem may have been a typo, but I realized that it is > in fact a subtle difference between the way data.table and data.frame > work. > > One must provide unquoted names in the `j' expression for a > data.table, i.e. one can say x.dt[ , y] but not x.dt[ , "y"] (which > will evaluate to just "y" and hence the error). > > There are tricks around it like using with=FALSE, or using the > data.frame notation x.dt[["y"]]. But once again, you will find such > examples and explanations of idiomatic data.table expressions in the > vignettes. > > -- > ASB. > > On Thu, Jan 17, 2013 at 10:42 PM, David Bellot <[email protected]> > wrote: > > Hi Matthew, > > > > I read indeed the introduction but I wasn't sure about the way to write > it. > > Hence my question. > > > > In fact, I do agree if the function would sum(sqrt(y)), but in my case, I > > would like to do something like > > > > f <- function(d) head(d,1) > > > > It's a small example for the sake of simplicity, just to illustrate that > I > > really want to have access to the full sub data.frame (the d variable) > and > > not just one column. > > > > Best, > > David > > > > On Thu, Jan 17, 2013 at 5:07 PM, Matthew Dowle <[email protected]> > > wrote: > >> > >> > >> Akhil, > >> > >> Kind of, but defining : > >> > >> my.func <- function (d) { > >> sum(sqrt(d[["y"]])) > >> } > >> > >> followed by > >> > >> x.dt[ , my.func(.SD), by=x] > >> > >> isn't very data.table'ish. In fact the > >> advice is to avoid .SD if possible, for speed. > >> > >> We'd forget my.funct, and just do : > >> > >> x.dt[, sum(sqrt(y)), by=x] > >> > >> That is how we recommend it to be used, and > >> allows data.table to optimize the query (which > >> use of .SD may prevent). > >> > >> David - have you read the introduction vignette and have > >> you worked through example(data.table) at the prompt? > >> > >> Matthew > >> > >> > >> > >> On 17.01.2013 16:53, Akhil Behl wrote: > >>> > >>> If I am not wrong, you are looking for `.SD'. In fact you can put in > >>> the exact function you were throwing at ddply earlier. There are other > >>> special names like .SD that you can find in the data.table FAQs. > >>> > >>> Let's see: > >>> R> require(plyr) > >>> Loading required package: plyr > >>> > >>> R> require(data.table) > >>> Loading required package: data.table > >>> data.table 1.8.7 For help type: help("data.table") > >>> > >>> R> x.df <- data.frame(x=letters[1:2], y=1:10) > >>> R> x.dt <- data.table(x.df) > >>> R> > >>> R> my.func <- function (d) { # Define a function on the subset > >>> + sum(sqrt(d[["y"]])) > >>> + } > >>> R> > >>> R> # The plyr way: > >>> R> ddply(x.df, "x", my.func) -> ans.plyr > >>> R> > >>> R> # The data.table way: > >>> R> x.dt[ , my.func(.SD), by=x] -> ans.dt > >>> R> > >>> R> ans.plyr > >>> x V1 > >>> 1 a 10.61387 > >>> 2 b 11.85441 > >>> > >>> R> ans.dt > >>> x V1 > >>> 1: a 10.61387 > >>> 2: b 11.85441 > >>> > >>> For more help, try this on an R prompt: > >>> > >>> R> vignette('datatable-faq') > >>> > >>> -- > >>> ASB. > >>> > >>> On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <[email protected]> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I've been looking all around the web without a clear answer to this > >>>> trivial > >>>> problem. I'm sure I'm not looking where I should: > >>>> > >>>> in fact, I want to replace my use of ddply from the plyr package by > >>>> data.table. One of my main use is to group a big data.frame by a group > >>>> of > >>>> variable and do something on this sub data.frame: > >>>> > >>>> ddply( my_df, my_grouping_var, function (d) { do something with d } > ) > >>>> ----> d is a data.frame again > >>>> > >>>> and it's slow on big data.frame. > >>>> > >>>> > >>>> However, I don't really understand how to redo the same thing with a > >>>> data.table. Basically if "j" in a data.table is equivalent to the > select > >>>> clause in SQL, then how do I do SELECT * FROM etc... > >>>> > >>>> I want to be able to pass a function like in ddply that will receive > not > >>>> only a few columns but the full subset that is selected by the "by" > >>>> clause. > >>>> > >>>> Thanks... > >>>> Best, > >>>> David > >>>> > >>>> _______________________________________________ > >>>> datatable-help mailing list > >>>> [email protected] > >>>> > >>>> > >>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >>> > >>> _______________________________________________ > >>> datatable-help mailing list > >>> [email protected] > >>> > >>> > >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
