The original post from Santosh came through as a BCC. I guess GoogleGroups did the BCC. Will need to do more investigation.
> Which are the rows in dt1 that aren't in dt2 Another option may be a 'not join'; e.g., X[-X[Y,which=TRUE]] or seq(1,nrow(X))[-X[Y,which=TRUE]] Will add something to docs/wiki re 'not joins'. Matthew On Wed, 2011-05-04 at 13:00 -0400, Steve Lianoglou wrote: > Hi, > > On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas > <[email protected]> wrote: > > Hi Steve, > > > > Sorry ... strange problem .. Dont know why that happened. > > > > http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb > > It looks like your first email was sent to the @googlegroups.com > address (I didn't even know we had that setup), and the second one > came through the @lists.r-forge.r-project. > > So (I guess) the first didn't come through because it was sent to the > wrong(?) list -- anyway, in the future you should send to the > @lists.r-forge... one. > > > I had the question and my attempt to answer before someone says go read the > > manual :) > > It looks like the answer you offered is reasonable, though. > > In short -- the question was "How can I quickly tell which (keyed) > rows are in one data.table vs. another)". > > As you mentioned, you can do this by joining using `[` -- in order to > do this easily, you could ensure that each data.table has a column > that isn't in the other. > > For example, if you have data like so: > > > R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b") > R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b") > > Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too > fast (especially if one is just a subset of the other (like dt2 is to > dt1): > > R> dt1[dt2] > a b > [1,] 1 a > [2,] 3 c > [3,] 5 e > [4,] 10 j > > R> dt2[dt1] > a b > [1,] 1 a > [2,] 2 b > [3,] 3 c > [4,] 4 d > [5,] 5 e > [6,] 6 f > [7,] 7 g > [8,] 8 h > [9,] 9 i > [10,] 10 j > > Adding some 'dummy' columns may help: > > R> dt1$in.1 <- TRUE > R> dt2$in.2 <- TRUE > > Then you can (easily) ask which rows are in dt1 that aren't in dt2: > > R> dt2[dt1] ## nomatch=NA is the default > a b in.2 in.1 > [1,] 1 a TRUE TRUE > [2,] 2 b NA TRUE > [3,] 3 c TRUE TRUE > [4,] 4 d NA TRUE > [5,] 5 e TRUE TRUE > [6,] 6 f NA TRUE > [7,] 7 g NA TRUE > [8,] 8 h NA TRUE > [9,] 9 i NA TRUE > [10,] 10 j TRUE TRUE > > ## or more email friendly format: > R> which(is.na(dt2[dt1]$in.2)) > [1] 2 4 6 7 8 9 > > Which are the rows in dt1 that aren't in dt2 > > HTH, > -steve > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
