Re: [Bioc-devel] Changes in AnnotationDbi

Simon Anders Tue, 09 Jun 2015 02:53:52 -0700

Hi

My two cents:


On 04/06/15 19:50, James W. MacDonald wrote:

In other words, for me it is a common practice to do something like this:

fit <- lmFit(eset, design)
fit2 <- eBayes(fit)
gns <- select(<chippackage>, featureNames(eset), c("ENTREZID","SYMBOL"))
gns <- gns[!duplicated(gns[,1]),]
fit2$genes <- gns

I add in the step where dups are removed because I already know they are
there. But a naive user might instead do

fit2$genes <- select(<chippackage>, featureNames(eset),
c("ENTREZID","SYMBOL"))

I'm not even that happy with James' first solution, as it relies on theorder being correct after removing the duplicates. I'd feel safer to use'match' to ensure that. (What if an EntrezId is not found in theAnnotation DB? Will we have a line with NA, or is the line simplymissing? The latter would break James' code.)

What users really want here is a way to get the "preferred" symbol foran entrezId, and for lack of this, they accept simply a random one orthe first one (in some unspecified collation). So, we should have afunction, maybe 'select1', to select one and only one hit for each queryvalue.


  select1(x, keys, columns, keytype, requireUnique=FALSE, ... )

This would query the AnnotationDbi object 'x' as does 'select', butreturn a data frame with the columns specified in 'columns', and thevector that was passed as 'keys' as row names, thus guaranteeing thateach line in the data frame corresponds to one query key. If there weremultiple records for a key, the first one is used, unless'requireUnique' is set, in which case an error is issued. And if norecord is present for a key, the data frame contains a row of NAs forthis key.


This would be quite convenient for any kind of ID conversion issues.

  Simon

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Changes in AnnotationDbi

Reply via email to