Hi, Suppose you've a data.table, say:
require(data.table) DT <- data.table(x = 1:5, y = 6:10) Suppose you want to group by "x %/% 2" ( = 0, 1,1, 2,2) and then calculate the sum of each column for each group, then one would do: DT[, grp := x %/% 2] DT[, list(x.sum=sum(x), y.sum=sum(y)), by = grp] # avoid .SD in case of few columns Now, assume that you've many many columns which would make the use of `.SD` sensible. DT[, lapply(.SD, sum), by = grp] grp x y 1: 0 1 6 2: 1 5 15 3: 2 9 19 The issue is that if you create the grouping column ad-hoc, then the column from which the ad-hoc grouping column is derived is not available to .SD. Let me illustrate this: DT <- data.table(x = 1:5, y = 6:10) DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of grouping column grp y 1: 0 6 2: 1 15 3: 2 19 I think it'd be nice to have the column available to `.SD` so that we can save creating a temporary column, grouping and then deleting it, as "technically" it *is* a new column (meaning, "x" must still be available). Any take on this? Arun
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
