Hi,
Seems consistent with out of order factor levels. The binary search
relies on levels being sorted. If that's it then please track down the
earlier point where the out-of-order factor levels were introduced and
maybe a fix is needed there. Everything else here is correct behaviour.
Matthew

On Fri, 2011-03-04 at 21:43 -0500, Steve Lianoglou wrote:
> Hi Mel,
> 
> On Fri, Mar 4, 2011 at 8:15 PM, Bacou, Melanie <[email protected]> wrote:
> > Steve,
> >
> > Try instead:
> >
> > R> m2[J(9)]
> >
> > It seems your original entrez.id key is integer not character
> 
> It's actually a factor:
> 
> R> is(m2$entrez.id)
> [1] "factor"   "integer"  "oldClass" "numeric"  "vector"
> 
> and moreover:
> 
> R> '9' %in% levels(m2$entrez.id)
> [1] TRUE
> 
> and the integer J() maneuver is a no go:
> 
> R> Error in `[.data.table`(m2, J(9)) :
>   x.entrez.id is a factor but joining to i.V1 which is not a factor.
> Factors must join to factors.
> 
> > -- but to be honest I'm not sure why:
> >
> > R> m2[9]
> >
> > doesn't work either...
> 
> That works, in that it does something, but it just gets the 9th row of
> m2, not the row whose key is '9'
> 
> Seems like something's strange is afoot here ...
> 
> -steve
> 
> > --Mel.
> >
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Steve
> > Lianoglou
> > Sent: Friday, March 04, 2011 5:46 PM
> > To: [email protected]
> > Subject: [datatable-help] Something seems funky. I think with
> > character-to-factor conversion for keys (?)
> >
> > I'll have to apologize in advance because I can't create a
> > reproducible example for this behavior, but I'll keep trying .. please
> > bear with me.
> >
> > Somehow I've ended up with a data.table `m2` that looks like this:
> >
> > R> m2
> >      entrez.id total.tags.liver cds.liver intron.liver utr.liver
> >  [1,]         9               27         0            0         0
> >  [2,]        10              347         0            0         0
> >  [3,]        12             5076         0           17         0
> >  [4,]        13             2445         0            0         0
> >  [5,]        18             2076         0            0         0
> >  [6,]        20               15         0            0         0
> >  [7,]        25               62         0            0         0
> >  [8,]        32              320         0            0         0
> >  [9,]        34             1377         0            0         0
> > [10,]        35              757         0            0         0
> > First 10 rows of 5236 printed.
> >
> > R> key(m2)
> > [1] "entrez.id"
> >
> > R> any(duplicated(m2$entrez.id))
> > [1] FALSE
> >
> > So far so good -- I stumbled on the following problem when `merge`-ing
> > two large data tables which was giving me a stranger error. In the
> > process of trying to smoke out the problem, I notice this unexpected
> > behavior:
> >
> > ## This is expected
> > R> subset(m2, entrez.id == '9')
> >     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> > [1,]         9               27         0            0         0
> >
> > ## This isn't
> > R> m2['9']
> >     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> > [1,]         9               NA        NA           NA        NA
> >
> > Woops! Isn't that supposed to return the same as above?
> >
> > I can fix `m2` by manipulating the key column:
> >
> > R> key(m2) <- NULL ## probably not necessary
> > R> m2$entrez.id <- as.character(m2$entrez.id)
> > R> key(m2) <- 'entrez.id'
> > R> m2['9']
> >     entrez.id total.tags.liver cds.liver intron.liver utr.liver
> > [1,]         9               27         0            0         0
> >
> > (side note: the bug I mentioned when I try to `merge` this w/ another
> > data.table is gone after I did the above fix).
> >
> > So -- I guess my point is that I'm not exactly sure how I got `m2` to
> > have a funky key, but the fact that it got messed up like this somehow
> > I think is undesired behavior, no?
> >
> > Does this point to something (maybe obvious) that happened on the way
> > to building up `m2`?
> >
> > Thanks,
> > -steve
> >
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> >  | Memorial Sloan-Kettering Cancer Center
> >  | Weill Medical College of Cornell University
> > Contact Info: http://cbio.mskcc.org/~lianos/contact
> > _______________________________________________
> > datatable-help mailing list
> > [email protected]
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> > _______________________________________________
> > datatable-help mailing list
> > [email protected]
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> 
> 
> 


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to