Here is one way of doing it:
n - 20
set.seed(2)
# create test dataframe
x - as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
V1 V2 V3 V4 V5 V6
1 1 2 2 2 1 1
2 2 1 1 2 2 1
3 2 2 1 2 1 2
4 1 1 1 1 1 2
5 2 1 2 2 1 1
6 2 1 2 1 2 2
7 1 1 2 1
I answered the wrong question. Here is the code to find all the
matches for each row:
n - 20
set.seed(2)
# create test dataframe
x - as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
x
x.col - c(1,3,5)
# match against all the other rows
x.match1 - apply(x[, x.col], 1, function(a){
.mat -
Another approach is:
n - 20
set.seed(2)
x - as.data.frame(matrix(sample(1:2, n*6, TRUE), nrow = n))
x.col - c(1, 3, 5)
values - do.call(paste, c(x[x.col], sep = \r))
out - lapply(seq_along(ind), function (i) {
ind - which(values == values[i])
ind[!ind %in% i]
})
out
Best,
Dimitris
If I understand your intent, I believe you can get what you want much faster
(no interpreted loops and linear times) by looking at this slightly
differently.
First of all, the choice of columns is unimportant, as indexing can be used
to create a data frame containing only the columns of
Bert, Jim, Dimitris and Joris,
Thank you all very much for your prompt help and suggestions.
After trying the ideas out, I have decided to go with Bert's approach
since it is by far the fastest of the lot.
Thanks again!
Rama Ramakrishnan
On Oct 8, 2009, at 12:49 PM, Bert Gunter wrote:
Bert Gunter wrote:
If I understand your intent, I believe you can get what you want much faster
(no interpreted loops and linear times) by looking at this slightly
differently.
First of all, the choice of columns is unimportant, as indexing can be used
to create a data frame containing only the
Hi Friends,
I have a data frame d. Let vars be the column indices for a subset of
the columns in d (e.g., vars - c(1,3,4,8))
For each row r in d, I want to collect all the other rows in d that
match the values in row r for just the columns in vars.
The naive way to do this is to have a
7 matches
Mail list logo