Re: [datatable-help] Best way to apply function to set of columns to create new columns where the function requires other columns from data.table

Frank Erickson Thu, 12 Feb 2015 07:41:14 -0800

Hi Marc,

I think the set function is a good fit:


for (j0 in varnames)
set(dt,j=paste0(j0,'_mean'),value=wtd.mean(dt[[j0]],dt[[3]]))

I guess this is significantly more efficient than nested ['s and .SD's if
your data is large. If your data.table is really big, though, maybe you
want to assign the weighted means elsewhere...? They're just scalars, so
you probably don't need them filling out a vector of the data table.

--Frank


On Tue, Feb 10, 2015 at 1:58 PM, Marc Halperin <[email protected]> wrote:

> I want to add new columns to a data.table that is the weighted average of
> the columns and a weight variable.  This is a general problem I run into
> when using .SDcols but also needing another variable from the data.table to
> be available within the function within lapply.  Without including that
> variable within .SDcols (in this case the weight variable), I don't have
> access to it in the lapply function argument.   Is it a bad idea to subset
> .SD how I've done it?
>
> library(data.table)
> library(Hmisc)
>
> dt <- data.table(a=runif(10), b= runif(10), weight=runif(10))
>
> varnames <- c("a","b")
>
> dt[ , ( paste( "mean", varnames, sep = "_" ) ) := lapply( .SD[ , .SD,
> .SDcols = -"weight" ], wtd.mean, weight ), .SDcols = c("weight",varnames) ]
>
> Thanks
>
> -Marc
> _______________________________________________
> datatable-help mailing list
> [email protected]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Best way to apply function to set of columns to create new columns where the function requires other columns from data.table

Reply via email to