Hi,
I have a huge matrix (4000 * 2000 data points) and I would like to retrieve
the coordinates (column and row) for the top 50 (or x) values. Some
positions in the matrix have NA as a value. These should be discarded.
My current method is to replace all NAs by 0, then rank all the values and
Matrix is just a vector. So order should work
haven't verified the following code.
a - matrix(rnorm(4000*2000), 4000, 2000)
b - order(a, na.last=TRUE, decreasing=TRUE)[1:50]
use %% or %/% to get the row# and column #s
Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of
Hi:
Here's a faked up example:
a - matrix(rnorm(4000*2000), 4000, 2000)
# Generate some NAs in the matrix
nr - sample(50, 1:4000)
nc - sample(50, 1:2000)
a[nr, nc] - NA
# convert to data frame:
b - data.frame(row = rep(1:4000, 2000), col = rep(1:2000, each = 4000),
x =
m - matrix(round(rnorm(4000 * 2000), 4), nr = 4000)
is.na(m) - sample(8e6, 1e6)
system.time(
idx - which(
matrix(m %in% head(sort(m, TRUE), 50),
nr = nrow(m)), arr.ind = TRUE))
# user system elapsed
# 3.120.193.18
-Peter Ehlers
On 2010-06-18 5:13, Dennis
You might also want to consider _partial sorting_ by using the
'partial' argument of sort(), especially when the number of data
points is really large.
Since argument 'decreasing=FALSE' is not supported when using
'partial', you have to flip it yourself by negating the values, e.g.
x -
Hi:
From what I can tell, Henrik efficiently finds the 50 largest values without
the matrix
indices and Peter efficiently finds the matrix indices without the
corresponding values.
Let's combine the two:
x - rnorm(8e6)
is.na(x) - sample(8e6, 1e6)
n - 50
x1 - sort(x, decreasing=TRUE)[1:n]
# Find
6 matches
Mail list logo