Hi, On Sat, Mar 5, 2011 at 4:06 PM, Matthew Dowle <[email protected]> wrote: > Hi, > Seems consistent with out of order factor levels. The binary search > relies on levels being sorted. If that's it then please track down the > earlier point where the out-of-order factor levels were introduced and > maybe a fix is needed there. Everything else here is correct behaviour.
I know it sounds lame, but I'm having problems tracking down how my key/factor column arrived at having out of order levels. While I try to smoke that out, do you think it would be a good idea to write a small utility at the C level to scan through the levels() of factor-keys to test for them being in order and breaking/short-circuiting as soon as it finds one level that's out of order? This way we can fire off a warning when this problem is detected so the user would be warned to expect "weird" behavior (and also know how to fix(?)) I'm not sure exactly where/when we would invoke that test -- maybe after calls to setkey ... and optionally under merge-like operations. I can take a crack at doing that if it seems like a good idea. -steve > Matthew > > On Fri, 2011-03-04 at 21:43 -0500, Steve Lianoglou wrote: >> Hi Mel, >> >> On Fri, Mar 4, 2011 at 8:15 PM, Bacou, Melanie <[email protected]> wrote: >> > Steve, >> > >> > Try instead: >> > >> > R> m2[J(9)] >> > >> > It seems your original entrez.id key is integer not character >> >> It's actually a factor: >> >> R> is(m2$entrez.id) >> [1] "factor" "integer" "oldClass" "numeric" "vector" >> >> and moreover: >> >> R> '9' %in% levels(m2$entrez.id) >> [1] TRUE >> >> and the integer J() maneuver is a no go: >> >> R> Error in `[.data.table`(m2, J(9)) : >> x.entrez.id is a factor but joining to i.V1 which is not a factor. >> Factors must join to factors. >> >> > -- but to be honest I'm not sure why: >> > >> > R> m2[9] >> > >> > doesn't work either... >> >> That works, in that it does something, but it just gets the 9th row of >> m2, not the row whose key is '9' >> >> Seems like something's strange is afoot here ... >> >> -steve >> >> > --Mel. >> > >> > -----Original Message----- >> > From: [email protected] >> > [mailto:[email protected]] On Behalf Of Steve >> > Lianoglou >> > Sent: Friday, March 04, 2011 5:46 PM >> > To: [email protected] >> > Subject: [datatable-help] Something seems funky. I think with >> > character-to-factor conversion for keys (?) >> > >> > I'll have to apologize in advance because I can't create a >> > reproducible example for this behavior, but I'll keep trying .. please >> > bear with me. >> > >> > Somehow I've ended up with a data.table `m2` that looks like this: >> > >> > R> m2 >> > entrez.id total.tags.liver cds.liver intron.liver utr.liver >> > [1,] 9 27 0 0 0 >> > [2,] 10 347 0 0 0 >> > [3,] 12 5076 0 17 0 >> > [4,] 13 2445 0 0 0 >> > [5,] 18 2076 0 0 0 >> > [6,] 20 15 0 0 0 >> > [7,] 25 62 0 0 0 >> > [8,] 32 320 0 0 0 >> > [9,] 34 1377 0 0 0 >> > [10,] 35 757 0 0 0 >> > First 10 rows of 5236 printed. >> > >> > R> key(m2) >> > [1] "entrez.id" >> > >> > R> any(duplicated(m2$entrez.id)) >> > [1] FALSE >> > >> > So far so good -- I stumbled on the following problem when `merge`-ing >> > two large data tables which was giving me a stranger error. In the >> > process of trying to smoke out the problem, I notice this unexpected >> > behavior: >> > >> > ## This is expected >> > R> subset(m2, entrez.id == '9') >> > entrez.id total.tags.liver cds.liver intron.liver utr.liver >> > [1,] 9 27 0 0 0 >> > >> > ## This isn't >> > R> m2['9'] >> > entrez.id total.tags.liver cds.liver intron.liver utr.liver >> > [1,] 9 NA NA NA NA >> > >> > Woops! Isn't that supposed to return the same as above? >> > >> > I can fix `m2` by manipulating the key column: >> > >> > R> key(m2) <- NULL ## probably not necessary >> > R> m2$entrez.id <- as.character(m2$entrez.id) >> > R> key(m2) <- 'entrez.id' >> > R> m2['9'] >> > entrez.id total.tags.liver cds.liver intron.liver utr.liver >> > [1,] 9 27 0 0 0 >> > >> > (side note: the bug I mentioned when I try to `merge` this w/ another >> > data.table is gone after I did the above fix). >> > >> > So -- I guess my point is that I'm not exactly sure how I got `m2` to >> > have a funky key, but the fact that it got messed up like this somehow >> > I think is undesired behavior, no? >> > >> > Does this point to something (maybe obvious) that happened on the way >> > to building up `m2`? >> > >> > Thanks, >> > -steve >> > >> > -- >> > Steve Lianoglou >> > Graduate Student: Computational Systems Biology >> > | Memorial Sloan-Kettering Cancer Center >> > | Weill Medical College of Cornell University >> > Contact Info: http://cbio.mskcc.org/~lianos/contact >> > _______________________________________________ >> > datatable-help mailing list >> > [email protected] >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >> > _______________________________________________ >> > datatable-help mailing list >> > [email protected] >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >> >> >> > > > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
