Well done, Matthew! Will try to test it soon ...
Thanks, -steve On Tue, Jun 21, 2011 at 3:40 PM, Matthew Dowle <[email protected]> wrote: > Andreas, Steve, > > Committed. Please test and confirm if it satisfies all needs ok? > > o A new symbol .BY is available to j, containing 1 row > of the current 'by' variables, type list. 'by' variables > may also be used by name and they are now length 1, too. > This implements FR#1313. > For example : > DT[,sum(x)*.BY[[1]],by=y] > DT[,sum(x)*.BY[[1]],by=eval(byexp)] > DT[,sapply(.SD,sum)*y,by=y] > DT[,sapply(.SD,sum)*.BY[[2]],by=list(y,z)] > > Matthew > > > > On Wed, 2011-05-11 at 10:24 +0200, Andreas Borg wrote: >> Hi Steve, >> >> > Now that you've brought this back up, what do you think you would >> > prefer? For example, using my (admittedly contrived) original example: >> > >> > result <- some.big.data.table[, by=list(colA, colB), { >> > ## Sometimes I want to know what the current values of >> > ## colA and colB are in here to get some more info. Mabye >> > ## we can have .BY: >> > >> > xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something >> > ## ... >> > }] >> > >> > Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)` >> > more natural, you think? >> > >> > >> 'J(colA, colB)' is perfect if you know the column names in advance. This >> is not true in my case. I created a minimal example for a possible >> application for a '.BY' construct: >> >> > dt <- data.table(x=c(0,1,0,1), y=c(1,0,1,0)) >> > dt >> x y >> [1,] 0 1 >> [2,] 1 0 >> [3,] 0 1 >> [4,] 1 0 >> >> From this table, I want the row sum for each group, i.e. "select x + y >> from dt group by x, y" in SQL. This would be: >> >> > setkey(dt, x, y) >> > dt[,sum(x[1], y[1]), by=list(x,y)] >> x y V1 >> [1,] 0 1 1 >> [2,] 1 0 1 >> >> But what if dt can have an arbitrary number of (grouping) columns with >> arbitrary names? If the grouping columns are given as >> >> groupCols <- c("x", "y") >> >> , the following is possible: >> >> > expr <- parse(text = sprintf("sum(%s)", paste(groupCols, "[1]", >> sep="", collapse=", "))) >> > dt[,eval(expr), by=groupCols] >> x y V1 >> [1,] 0 1 1 >> [2,] 1 0 1 >> >> Now, this is certainly uglier than >> >> > dt[, sum(.BY), by = groupCols] >> >> My actual application is that I apply decision tree models (rpart) to a >> large number of binary patterns. In order to save computation time, I >> classify each distinct pattern only once. So what I basically do is to >> group by all attributes and apply the model once to each group. >> >> Andreas >> > > > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
