Hello Everybody,
The code:
dfmed-lapply(unique(colnames(df)), function(x)
rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
takes really long time to execute ( in hours). Is there a faster way to do
this?
Thanks!
On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:
Assuming your original matrix IS a matrix, call it yourmat, and not a
data frame (whose columns **must* have unique names if you haven't
messed with the check.names default) then maybe:
UNTESTED!!! ###
thenames - unique(dimnames(yourmat)[[2]])
ans - lapply(thenames, function(nm, {
apply(
I wonder how you do this (or maybe on what kind of machine you execute it).
I tried it out of curiosity and get
df = as.data.frame(lapply(1:300,function(x)sample(200,25,T)))
colnames(df) = sample(letters[1:20],300,T)
system.time(dfmed-lapply(unique(colnames(df)), function(x)
+
Hmm.. that is interesting... I did this on our server machine which has
about 200 cores. So memory is not an issue. Also, building the dataframe
takes about a few minutes maximum for me. My code is similar to yours but
for the fact that I create my dataframe from read.delim(filename) and
then I
Just adding a few cents to this:
rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1,
FUN=median) - at least on my local Windows 7 64bit tests. You can do
these simple benchmark runs yourself via the
matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed
[R-forge].
On May 23, 2012, at 19:30 , Preeti wrote:
Hmm.. that is interesting... I did this on our server machine which has
about 200 cores. So memory is not an issue. Also, building the dataframe
takes about a few minutes maximum for me. My code is similar to yours but
for the fact that I create my
On Wed, May 23, 2012 at 11:54 AM, peter dalgaard pda...@gmail.com wrote:
On May 23, 2012, at 19:30 , Preeti wrote:
Hmm.. that is interesting... I did this on our server machine which has
about 200 cores. So memory is not an issue. Also, building the dataframe
takes about a few minutes
Yes, thanks Henrik. I neglected to mention that rowMedians could just
be plugged in instead of apply (..,1,...)
However, my main point is that that's probably not what matters,as
Benno points out. Maybe it's the data frames instead of the matrices,
but The process should execute in a few
Hi,
I have a 250,000 by 300 matrix. I am trying to calculate the median of
those columns (by row) with column names that are identical. I would like
this to be efficient since apply(x,1,median) where x is created by choosing
only those columns with same column name and looping on this is taking a
On Tue, May 22, 2012 at 01:34:45PM -0600, Preeti wrote:
Hi,
I have a 250,000 by 300 matrix. I am trying to calculate the median of
those columns (by row) with column names that are identical. I would like
this to be efficient since apply(x,1,median) where x is created by choosing
only those
See rowMedians() of the matrixStats package for replacing apply(x,
MARGIN=1, FUN=median). /Henrik
On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
Hi,
I have a 250,000 by 300 matrix. I am trying to calculate the median of
those columns (by row) with column names that are
Thanks Henrik! Here is the one-liner that I wrote:
dfmed-lapply(unique(colnames(df)), function(x)
rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
Thanks again!
On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.eduwrote:
See rowMedians() of the matrixStats package
12 matches
Mail list logo