Re: [R] Median computation

2012-05-23 Thread Preeti
Hello Everybody, The code: dfmed-lapply(unique(colnames(df)), function(x) rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) takes really long time to execute ( in hours). Is there a faster way to do this? Thanks! On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

Re: [R] Median computation

2012-05-23 Thread Bert Gunter
Assuming your original matrix IS a matrix, call it yourmat, and not a data frame (whose columns **must* have unique names if you haven't messed with the check.names default) then maybe: UNTESTED!!! ### thenames - unique(dimnames(yourmat)[[2]]) ans - lapply(thenames, function(nm, { apply(

Re: [R] Median computation

2012-05-23 Thread Benno Pütz
I wonder how you do this (or maybe on what kind of machine you execute it). I tried it out of curiosity and get df = as.data.frame(lapply(1:300,function(x)sample(200,25,T))) colnames(df) = sample(letters[1:20],300,T) system.time(dfmed-lapply(unique(colnames(df)), function(x) +

Re: [R] Median computation

2012-05-23 Thread Preeti
Hmm.. that is interesting... I did this on our server machine which has about 200 cores. So memory is not an issue. Also, building the dataframe takes about a few minutes maximum for me. My code is similar to yours but for the fact that I create my dataframe from read.delim(filename) and then I

Re: [R] Median computation

2012-05-23 Thread Henrik Bengtsson
Just adding a few cents to this: rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1, FUN=median) - at least on my local Windows 7 64bit tests. You can do these simple benchmark runs yourself via the matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed [R-forge].

Re: [R] Median computation

2012-05-23 Thread peter dalgaard
On May 23, 2012, at 19:30 , Preeti wrote: Hmm.. that is interesting... I did this on our server machine which has about 200 cores. So memory is not an issue. Also, building the dataframe takes about a few minutes maximum for me. My code is similar to yours but for the fact that I create my

Re: [R] Median computation

2012-05-23 Thread Preeti
On Wed, May 23, 2012 at 11:54 AM, peter dalgaard pda...@gmail.com wrote: On May 23, 2012, at 19:30 , Preeti wrote: Hmm.. that is interesting... I did this on our server machine which has about 200 cores. So memory is not an issue. Also, building the dataframe takes about a few minutes

Re: [R] Median computation

2012-05-23 Thread Bert Gunter
Yes, thanks Henrik. I neglected to mention that rowMedians could just be plugged in instead of apply (..,1,...) However, my main point is that that's probably not what matters,as Benno points out. Maybe it's the data frames instead of the matrices, but The process should execute in a few

[R] Median computation

2012-05-22 Thread Preeti
Hi, I have a 250,000 by 300 matrix. I am trying to calculate the median of those columns (by row) with column names that are identical. I would like this to be efficient since apply(x,1,median) where x is created by choosing only those columns with same column name and looping on this is taking a

Re: [R] Median computation

2012-05-22 Thread Petr Savicky
On Tue, May 22, 2012 at 01:34:45PM -0600, Preeti wrote: Hi, I have a 250,000 by 300 matrix. I am trying to calculate the median of those columns (by row) with column names that are identical. I would like this to be efficient since apply(x,1,median) where x is created by choosing only those

Re: [R] Median computation

2012-05-22 Thread Henrik Bengtsson
See rowMedians() of the matrixStats package for replacing apply(x, MARGIN=1, FUN=median). /Henrik On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote: Hi, I have a 250,000 by 300 matrix. I am trying to calculate the median of those columns (by row) with column names that are

Re: [R] Median computation

2012-05-22 Thread Preeti
Thanks Henrik! Here is the one-liner that I wrote: dfmed-lapply(unique(colnames(df)), function(x) rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) Thanks again! On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.eduwrote: See rowMedians() of the matrixStats package