On 12.05.2013 12:54, Arunkumar Srinivasan wrote:
> I just realised that I sent it only to MatthewDowle. So, sending it again. Sorry @Matthew for the double email. > > Matthew, >>> .BY is available to j already for that reason, does that work? .BY isn't a column of .SD because i) it's the same value for every row of .SD i.e. .BY[[1]] is length 1 and contains this particular group (replicating the same value would be wasteful) > DT[, print(.BY), by = list(grp = x %/% 2)] > > $grp > [1] 0 > $grp > [1] 1 > $grp > [1] 2 > > DT[, print(.SD), by = list(grp = x %/% 2)] # no column "x" > > y > 1: 6 > y > 1: 7 > 2: 8 > y > 1: 9 > 2: 10 > My question is not as to why the BY column is not available in .SD. Rather, since .BY does not have column "x" in it (rather the result of x%/% 2), why does .SD not have "x"? It's as if grp = x%/%2 is a "new column". So, "x" should be available to .SD is my point. Oh I see now. Yes data.table inspects the expressions used in 'by' and considers any columns used there as grouping columns and excludes those from .SD. An example is a date column containing daily observations. DT[, lapply(.SD,sum), by=month(date)] would not wish to sum() the "date" column. In ?data.table I've just changed : .SD is a data.table containing the Subset of x's Data for each group, excluding the group column(s). to .SD is a data.table containing the Subset of x's Data for each group, excluding any columns used in 'by' (or 'keyby'). Further answer below ... >>> but more significantly ii) it is often a character group name where running an aggregation function like sum() would trip up on it. > Again, I don't think so because, I am not asking for .BY columns to be in .SD. > DT[, grp := x%/% = 2] > DT[, lapply(.SD, sum), by=grp] > must be equal to: > DT[, lapply(.SD, sum), by = list(grp = x%/%2)] # here, "x" should be available to .SD as it's not the grouping column This makes sense in this case because x can be sum()-ed, but isn't true in general like the month(date) case above. In these cases you can use .SDcols to include all columns, even the ones used by by : > DT[, lapply(.SD, sum), by=list(grp=x%/%2)] grp y 1: 0 6 2: 1 15 3: 2 19 > DT[, lapply(.SD, sum), by=list(grp=x%/%2), .SDcols=names(DT)] grp x y 1: 0 1 6 2: 1 5 15 3: 2 9 19 > DT[, print(.SD), by = list(grp = x %/% 2), .SDcols=names(DT)] x y 1: 1 6 x y 1: 2 7 2: 3 8 x y 1: 4 9 2: 5 10 Arun >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
