[R] Issue with %in% - not matching identical rows in data frames
Hi folks I have two data frames. I know that the nth (let's say the 7th) row in the first data frame (sequence) is there in the second (today.sequence). When I try to check that by doing 'sequence[7,] %in% today.sequence', I get all FALSE when it should be all TRUE. I'm certain I'm making some trivial mistake. Any solutions? The code to recreate the data frames and see for yourself is: sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550, 14557, 14550, 14551, 14550), class = Date), DATASET = c(1L, 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L, 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA, -8L)) today.sequence - structure(list(DATE = structure(c(14551, 14550), class = Date), DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L), WRONGS_RATIO = c(0L, 0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame) sequence[7,] #You should see '2009-11-03 3 1 0 00' today.sequence #You can clearly see that sequence [7,] is the first row in today.sequence sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE TRUE TRUE TRUE'. Instead # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE' Thanks -- Kaushik Krishnan (kaushik.s.krish...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with %in% - not matching identical rows in data frames
?%in% says x and table must be vectors. You supplied data.frames. So %in% is coercing your today.sequence to a vector using as.character(today.sequence) Perhaps you should paste the columns together first: x - do.call(paste, c(sequence, sep = ::)) table - do.call(paste, c(today.sequence, sep = ::)) x[7] %in% table I'm not sure if this is what you want/need, but it does match your example. HTH, --sundar On Tue, Nov 3, 2009 at 7:53 AM, Kaushik Krishnan kaushik.s.krish...@gmail.com wrote: Hi folks I have two data frames. I know that the nth (let's say the 7th) row in the first data frame (sequence) is there in the second (today.sequence). When I try to check that by doing 'sequence[7,] %in% today.sequence', I get all FALSE when it should be all TRUE. I'm certain I'm making some trivial mistake. Any solutions? The code to recreate the data frames and see for yourself is: sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550, 14557, 14550, 14551, 14550), class = Date), DATASET = c(1L, 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L, 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA, -8L)) today.sequence - structure(list(DATE = structure(c(14551, 14550), class = Date), DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L), WRONGS_RATIO = c(0L, 0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame) sequence[7,] #You should see '2009-11-03 3 1 0 0 0' today.sequence #You can clearly see that sequence [7,] is the first row in today.sequence sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE TRUE TRUE TRUE'. Instead # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE' Thanks -- Kaushik Krishnan (kaushik.s.krish...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with %in% - not matching identical rows in data frames
Kaushik, The documentation doesn't quite tell (me, anyway) how the function behaves when 'target' is a list (or data.frame). You'll need to dig into match.c or experiment with match() or %in% to see what it is actually doing. But it looks like it is matching whole columns of the data.frame rather than elements within each column : sequence %in% sequence [1] TRUE TRUE TRUE TRUE TRUE TRUE sequence %in% rev(sequence) [1] TRUE TRUE TRUE TRUE TRUE TRUE sequence[1,] %in% sequence [1] FALSE FALSE FALSE FALSE FALSE FALSE sequence[1,] %in% sequence[1,] [1] TRUE TRUE TRUE TRUE TRUE TRUE Maybe you wanted something like mapply( function(x,y) x%in%y , sequence[7, ], today.sequence ) ?? HTH, Chuck On Tue, 3 Nov 2009, Kaushik Krishnan wrote: Hi folks I have two data frames. I know that the nth (let's say the 7th) row in the first data frame (sequence) is there in the second (today.sequence). When I try to check that by doing 'sequence[7,] %in% today.sequence', I get all FALSE when it should be all TRUE. I'm certain I'm making some trivial mistake. Any solutions? The code to recreate the data frames and see for yourself is: sequence - structure(list(DATE = structure(c(14549, 14549, 14553, 14550, 14557, 14550, 14551, 14550), class = Date), DATASET = c(1L, 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L, 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), class = data.frame, row.names = c(NA, -8L)) today.sequence - structure(list(DATE = structure(c(14551, 14550), class = Date), DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L), WRONGS_RATIO = c(0L, 0L), DONE = c(0L, 0L)), .Names = c(DATE, DATASET, REP, WRONGS_ABS, WRONGS_RATIO, DONE), row.names = 7:8, class = data.frame) sequence[7,] #You should see '2009-11-03 3 1 0 00' today.sequence #You can clearly see that sequence [7,] is the first row in today.sequence sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE TRUE TRUE TRUE'. Instead # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE' Thanks -- Kaushik Krishnan (kaushik.s.krish...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.