Hi Andreas, On Tue, May 10, 2011 at 9:38 AM, Andreas Borg <[email protected]> wrote: > Hi all, > > I support this proposal (original message below). One more suggestion on > this: It might be useful if the proposed ".BY" object would have only a > single row with the current values of the grouping variables instead of as > much (duplicate) rows as the group. Whatever computation one wants to do > with .BY would need to be executed only once and the result recycled for > each row in the group. > > Anyway, are there any news on this topic?
It is on the radar: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1313&group_id=240&atid=978 I was planning on letting Matthew crank this one out since it was in the c-guts of data.table, but maybe I can take a look at it, too. Although I initially proposed the .BY thing, I think Matthew's follow up (I forget) might have questioned the reasoning behind using .BY instead of just injecting the variable into the scope w/o .BY Now that you've brought this back up, what do you think you would prefer? For example, using my (admittedly contrived) original example: result <- some.big.data.table[, by=list(colA, colB), { ## Sometimes I want to know what the current values of ## colA and colB are in here to get some more info. Mabye ## we can have .BY: xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something ## ... }] Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)` more natural, you think? I think I also agree with you that the length of the BY values only needs to be 1 (and not, say, the same as what nrow(.SD) would be). Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
