[R] Issue with %in% - not matching identical rows in data frames

2009-11-03 Thread Kaushik Krishnan
Hi folks

I have two data frames.  I know that the nth (let's say the 7th) row
in the first data frame (sequence) is there in the second
(today.sequence).  When I try to check that by doing 'sequence[7,]
%in% today.sequence', I get all FALSE when it should be all TRUE.

I'm certain I'm making some trivial mistake.  Any solutions?

The code to recreate the data frames and see for yourself is:

sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
14557, 14550, 14551, 14550), class = Date), DATASET = c(1L,
2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS,
WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA,
-8L))

today.sequence - structure(list(DATE = structure(c(14551, 14550),
class = Date),
DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
WRONGS_RATIO = c(0L,
0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP,
WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame)

sequence[7,] #You should see '2009-11-03   3   1  0
00'

today.sequence #You can clearly see that sequence [7,] is the first
row in today.sequence

sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
TRUE TRUE TRUE'.  Instead
# it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'


Thanks

-- 
Kaushik Krishnan
(kaushik.s.krish...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with %in% - not matching identical rows in data frames

2009-11-03 Thread Sundar Dorai-Raj
?%in% says x and table must be vectors. You supplied
data.frames. So %in% is coercing your today.sequence to a vector using

as.character(today.sequence)

Perhaps you should paste the columns together first:

x - do.call(paste, c(sequence, sep = ::))
table - do.call(paste, c(today.sequence, sep = ::))
x[7] %in% table

I'm not sure if this is what you want/need, but it does match your example.

HTH,

--sundar

On Tue, Nov 3, 2009 at 7:53 AM, Kaushik Krishnan
kaushik.s.krish...@gmail.com wrote:
 Hi folks

 I have two data frames.  I know that the nth (let's say the 7th) row
 in the first data frame (sequence) is there in the second
 (today.sequence).  When I try to check that by doing 'sequence[7,]
 %in% today.sequence', I get all FALSE when it should be all TRUE.

 I'm certain I'm making some trivial mistake.  Any solutions?

 The code to recreate the data frames and see for yourself is:
 
 sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
 14557, 14550, 14551, 14550), class = Date), DATASET = c(1L,
 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
 0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS,
 WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA,
 -8L))

 today.sequence - structure(list(DATE = structure(c(14551, 14550),
 class = Date),
    DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
 WRONGS_RATIO = c(0L,
    0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP,
 WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame)

 sequence[7,] #You should see '2009-11-03       3   1          0
    0    0'

 today.sequence #You can clearly see that sequence [7,] is the first
 row in today.sequence

 sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
 TRUE TRUE TRUE'.  Instead
 # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'
 

 Thanks

 --
 Kaushik Krishnan
 (kaushik.s.krish...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with %in% - not matching identical rows in data frames

2009-11-03 Thread Charles C. Berry



Kaushik,

The documentation doesn't quite tell (me, anyway) how the function behaves 
when 'target' is a list (or data.frame). You'll need to dig into match.c 
or experiment with match() or %in% to see what it is actually doing.


But it looks like it is matching whole columns of the data.frame rather 
than elements within each column :



 sequence %in% sequence

[1] TRUE TRUE TRUE TRUE TRUE TRUE

 sequence %in% rev(sequence)

[1] TRUE TRUE TRUE TRUE TRUE TRUE


 sequence[1,] %in% sequence

[1] FALSE FALSE FALSE FALSE FALSE FALSE

 sequence[1,] %in% sequence[1,]

[1] TRUE TRUE TRUE TRUE TRUE TRUE




Maybe you wanted something like

mapply( function(x,y) x%in%y , sequence[7, ], today.sequence )

??

HTH,

Chuck


On Tue, 3 Nov 2009, Kaushik Krishnan wrote:


Hi folks

I have two data frames.  I know that the nth (let's say the 7th) row
in the first data frame (sequence) is there in the second
(today.sequence).  When I try to check that by doing 'sequence[7,]
%in% today.sequence', I get all FALSE when it should be all TRUE.

I'm certain I'm making some trivial mistake.  Any solutions?

The code to recreate the data frames and see for yourself is:

sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
14557, 14550, 14551, 14550), class = Date), DATASET = c(1L,
2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS,
WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA,
-8L))

today.sequence - structure(list(DATE = structure(c(14551, 14550),
class = Date),
   DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
WRONGS_RATIO = c(0L,
   0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP,
WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame)

sequence[7,] #You should see '2009-11-03   3   1  0
   00'

today.sequence #You can clearly see that sequence [7,] is the first
row in today.sequence

sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
TRUE TRUE TRUE'.  Instead
# it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'


Thanks

--
Kaushik Krishnan
(kaushik.s.krish...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.