[datatable-help] columns in .SD with grouping ad-hoc using "by"

Arunkumar Srinivasan Sun, 12 May 2013 01:12:26 -0700

Hi,  

Suppose you've a data.table, say:


require(data.table)
DT <- data.table(x = 1:5, y = 6:10)

Suppose you want to group by "x %/% 2" ( = 0, 1,1, 2,2) and then calculate the 
sum of each column for each group, then one would do:

DT[, grp := x %/% 2]
DT[, list(x.sum=sum(x), y.sum=sum(y)), by = grp] # avoid .SD in case of few 
columns

Now, assume that you've many many columns which would make the use of `.SD` 
sensible.

DT[, lapply(.SD, sum), by = grp]
  grp x  y
1:   0 1  6
2:   1 5 15
3:   2 9 19


The issue is that if you create the grouping column ad-hoc, then the column 
from which the ad-hoc grouping column is derived is not available to .SD. Let 
me illustrate this:

DT <- data.table(x = 1:5, y = 6:10)


DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of grouping column
   grp  y
1:   0  6
2:   1 15
3:   2 19



I think it'd be nice to have the column available to `.SD` so that we can save 
creating a temporary column, grouping and then deleting it, as "technically" it 
*is* a new column (meaning, "x" must still be available). Any take on this?

Arun

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

[datatable-help] columns in .SD with grouping ad-hoc using "by"

Reply via email to