Hi, On Sun, Nov 18, 2012 at 11:19 AM, Philip de Witt Hamer <[email protected]> wrote: > Dear all, > > data.table is great! thanks for this life(time)saving package. > > Now, I run into a difficult nut to crack using ':='. > I'd like to do a calculation using column information conditional on another > column > > first some jumbo data: > > library(data.table) > DT <- data.table( > 1:50, > rep(1:5,each=10), > runif(50,0,1) > ) > setnames(DT, 1:3, c("id","grp","p")) > > id's are unique > grp's speaks for itself > think of p's as e.g. p-values > > next, if I want to obtain the nr of p values at least as extreme as the p of > each row from the whole set, this seems to work well: > > DT[,c1 := sum(DT[,p] <= p), by=id] > > but then, I would like to get the nr of p values at least as extreme as the > p of each row for the subset with identical grp, I am having a hard time, > because these attempts fail: > > DT[,c2 := sum(DT[grp,p] <= p),by=id] > DT[,c3 := sum(DT[DT[,grp]==grp,p] <= p), by=id]
You will want to group by "grp". This gets you pretty close -- it fails the "ties" criterion: DT[, cg := rank(p) - 1, by=grp] If you *really* want to keep the ties criterion, perhaps here's a way to do so by avoiding a for loop: DT[, cgo := rowSums(outer(p, p, '-') > 0), by=grp] The problem is that if your groups are very large, the `outer` call might chew lots of RAM, since you'll be creating a p x p matrix (per group). Does that get you where you need to be? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
