Hey David, I thought your problem may have been a typo, but I realized that it is in fact a subtle difference between the way data.table and data.frame work.
One must provide unquoted names in the `j' expression for a data.table, i.e. one can say x.dt[ , y] but not x.dt[ , "y"] (which will evaluate to just "y" and hence the error). There are tricks around it like using with=FALSE, or using the data.frame notation x.dt[["y"]]. But once again, you will find such examples and explanations of idiomatic data.table expressions in the vignettes. -- ASB. On Thu, Jan 17, 2013 at 10:42 PM, David Bellot <[email protected]> wrote: > Hi Matthew, > > I read indeed the introduction but I wasn't sure about the way to write it. > Hence my question. > > In fact, I do agree if the function would sum(sqrt(y)), but in my case, I > would like to do something like > > f <- function(d) head(d,1) > > It's a small example for the sake of simplicity, just to illustrate that I > really want to have access to the full sub data.frame (the d variable) and > not just one column. > > Best, > David > > On Thu, Jan 17, 2013 at 5:07 PM, Matthew Dowle <[email protected]> > wrote: >> >> >> Akhil, >> >> Kind of, but defining : >> >> my.func <- function (d) { >> sum(sqrt(d[["y"]])) >> } >> >> followed by >> >> x.dt[ , my.func(.SD), by=x] >> >> isn't very data.table'ish. In fact the >> advice is to avoid .SD if possible, for speed. >> >> We'd forget my.funct, and just do : >> >> x.dt[, sum(sqrt(y)), by=x] >> >> That is how we recommend it to be used, and >> allows data.table to optimize the query (which >> use of .SD may prevent). >> >> David - have you read the introduction vignette and have >> you worked through example(data.table) at the prompt? >> >> Matthew >> >> >> >> On 17.01.2013 16:53, Akhil Behl wrote: >>> >>> If I am not wrong, you are looking for `.SD'. In fact you can put in >>> the exact function you were throwing at ddply earlier. There are other >>> special names like .SD that you can find in the data.table FAQs. >>> >>> Let's see: >>> R> require(plyr) >>> Loading required package: plyr >>> >>> R> require(data.table) >>> Loading required package: data.table >>> data.table 1.8.7 For help type: help("data.table") >>> >>> R> x.df <- data.frame(x=letters[1:2], y=1:10) >>> R> x.dt <- data.table(x.df) >>> R> >>> R> my.func <- function (d) { # Define a function on the subset >>> + sum(sqrt(d[["y"]])) >>> + } >>> R> >>> R> # The plyr way: >>> R> ddply(x.df, "x", my.func) -> ans.plyr >>> R> >>> R> # The data.table way: >>> R> x.dt[ , my.func(.SD), by=x] -> ans.dt >>> R> >>> R> ans.plyr >>> x V1 >>> 1 a 10.61387 >>> 2 b 11.85441 >>> >>> R> ans.dt >>> x V1 >>> 1: a 10.61387 >>> 2: b 11.85441 >>> >>> For more help, try this on an R prompt: >>> >>> R> vignette('datatable-faq') >>> >>> -- >>> ASB. >>> >>> On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <[email protected]> >>> wrote: >>>> >>>> Hi, >>>> >>>> I've been looking all around the web without a clear answer to this >>>> trivial >>>> problem. I'm sure I'm not looking where I should: >>>> >>>> in fact, I want to replace my use of ddply from the plyr package by >>>> data.table. One of my main use is to group a big data.frame by a group >>>> of >>>> variable and do something on this sub data.frame: >>>> >>>> ddply( my_df, my_grouping_var, function (d) { do something with d } ) >>>> ----> d is a data.frame again >>>> >>>> and it's slow on big data.frame. >>>> >>>> >>>> However, I don't really understand how to redo the same thing with a >>>> data.table. Basically if "j" in a data.table is equivalent to the select >>>> clause in SQL, then how do I do SELECT * FROM etc... >>>> >>>> I want to be able to pass a function like in ddply that will receive not >>>> only a few columns but the full subset that is selected by the "by" >>>> clause. >>>> >>>> Thanks... >>>> Best, >>>> David >>>> >>>> _______________________________________________ >>>> datatable-help mailing list >>>> [email protected] >>>> >>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
