Hi, Seems consistent with out of order factor levels. The binary search relies on levels being sorted. If that's it then please track down the earlier point where the out-of-order factor levels were introduced and maybe a fix is needed there. Everything else here is correct behaviour. Matthew
On Fri, 2011-03-04 at 21:43 -0500, Steve Lianoglou wrote: > Hi Mel, > > On Fri, Mar 4, 2011 at 8:15 PM, Bacou, Melanie <[email protected]> wrote: > > Steve, > > > > Try instead: > > > > R> m2[J(9)] > > > > It seems your original entrez.id key is integer not character > > It's actually a factor: > > R> is(m2$entrez.id) > [1] "factor" "integer" "oldClass" "numeric" "vector" > > and moreover: > > R> '9' %in% levels(m2$entrez.id) > [1] TRUE > > and the integer J() maneuver is a no go: > > R> Error in `[.data.table`(m2, J(9)) : > x.entrez.id is a factor but joining to i.V1 which is not a factor. > Factors must join to factors. > > > -- but to be honest I'm not sure why: > > > > R> m2[9] > > > > doesn't work either... > > That works, in that it does something, but it just gets the 9th row of > m2, not the row whose key is '9' > > Seems like something's strange is afoot here ... > > -steve > > > --Mel. > > > > -----Original Message----- > > From: [email protected] > > [mailto:[email protected]] On Behalf Of Steve > > Lianoglou > > Sent: Friday, March 04, 2011 5:46 PM > > To: [email protected] > > Subject: [datatable-help] Something seems funky. I think with > > character-to-factor conversion for keys (?) > > > > I'll have to apologize in advance because I can't create a > > reproducible example for this behavior, but I'll keep trying .. please > > bear with me. > > > > Somehow I've ended up with a data.table `m2` that looks like this: > > > > R> m2 > > entrez.id total.tags.liver cds.liver intron.liver utr.liver > > [1,] 9 27 0 0 0 > > [2,] 10 347 0 0 0 > > [3,] 12 5076 0 17 0 > > [4,] 13 2445 0 0 0 > > [5,] 18 2076 0 0 0 > > [6,] 20 15 0 0 0 > > [7,] 25 62 0 0 0 > > [8,] 32 320 0 0 0 > > [9,] 34 1377 0 0 0 > > [10,] 35 757 0 0 0 > > First 10 rows of 5236 printed. > > > > R> key(m2) > > [1] "entrez.id" > > > > R> any(duplicated(m2$entrez.id)) > > [1] FALSE > > > > So far so good -- I stumbled on the following problem when `merge`-ing > > two large data tables which was giving me a stranger error. In the > > process of trying to smoke out the problem, I notice this unexpected > > behavior: > > > > ## This is expected > > R> subset(m2, entrez.id == '9') > > entrez.id total.tags.liver cds.liver intron.liver utr.liver > > [1,] 9 27 0 0 0 > > > > ## This isn't > > R> m2['9'] > > entrez.id total.tags.liver cds.liver intron.liver utr.liver > > [1,] 9 NA NA NA NA > > > > Woops! Isn't that supposed to return the same as above? > > > > I can fix `m2` by manipulating the key column: > > > > R> key(m2) <- NULL ## probably not necessary > > R> m2$entrez.id <- as.character(m2$entrez.id) > > R> key(m2) <- 'entrez.id' > > R> m2['9'] > > entrez.id total.tags.liver cds.liver intron.liver utr.liver > > [1,] 9 27 0 0 0 > > > > (side note: the bug I mentioned when I try to `merge` this w/ another > > data.table is gone after I did the above fix). > > > > So -- I guess my point is that I'm not exactly sure how I got `m2` to > > have a funky key, but the fact that it got messed up like this somehow > > I think is undesired behavior, no? > > > > Does this point to something (maybe obvious) that happened on the way > > to building up `m2`? > > > > Thanks, > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
