Akhil,

Kind of, but defining :

my.func <- function (d) {
    sum(sqrt(d[["y"]]))
}

followed by

x.dt[ , my.func(.SD), by=x]

isn't very data.table'ish. In fact the
advice is to avoid .SD if possible, for speed.

We'd forget my.funct, and just do :

x.dt[, sum(sqrt(y)), by=x]

That is how we recommend it to be used, and
allows data.table to optimize the query (which
use of .SD may prevent).

David - have you read the introduction vignette and have
you worked through example(data.table) at the prompt?

Matthew


On 17.01.2013 16:53, Akhil Behl wrote:
If I am not wrong, you are looking for `.SD'. In fact you can put in
the exact function you were throwing at ddply earlier. There are other
special names like .SD that you can find in the data.table FAQs.

Let's see:
R> require(plyr)
Loading required package: plyr

R> require(data.table)
Loading required package: data.table
data.table 1.8.7  For help type: help("data.table")

R> x.df <- data.frame(x=letters[1:2], y=1:10)
R> x.dt <- data.table(x.df)
R>
R> my.func <- function (d) { # Define a function on the subset
+ sum(sqrt(d[["y"]]))
+ }
R>
R> # The plyr way:
R> ddply(x.df, "x", my.func) -> ans.plyr
R>
R> # The data.table way:
R> x.dt[ , my.func(.SD), by=x] -> ans.dt
R>
R> ans.plyr
  x       V1
1 a 10.61387
2 b 11.85441

R> ans.dt
   x       V1
1: a 10.61387
2: b 11.85441

For more help, try this on an R prompt:

R> vignette('datatable-faq')

--
ASB.

On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <[email protected]> wrote:
Hi,

I've been looking all around the web without a clear answer to this trivial
problem. I'm sure I'm not looking where I should:

in fact, I want to replace my use of ddply from the plyr package by
data.table. One of my main use is to group a big data.frame by a group of
variable and do something on this sub data.frame:

ddply( my_df, my_grouping_var, function (d) { do something with d } )
----> d is a data.frame again

and it's slow on big data.frame.


However, I don't really understand how to redo the same thing with a
data.table. Basically if "j" in a data.table is equivalent to the select
clause in SQL, then how do I do SELECT * FROM etc...

I want to be able to pass a function like in ddply that will receive not only a few columns but the full subset that is selected by the "by" clause.

Thanks...
Best,
David

_______________________________________________
datatable-help mailing list
[email protected]

https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]

https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to