Thanks Marc! On Mon, Jun 8, 2015 at 3:12 PM, Marc Carlson <mcarl...@fredhutch.org> wrote:
> OK Jim, > > I will put very simple messages in (one liners) that will simply state > whether the relationship between keys and the requested columns was 1:1, > 1:many, many:1, or many:many. Hopefully this will represent an acceptable > compromise. > > Marc > > > > On 06/05/2015 08:37 AM, James W. MacDonald wrote: > > I agree that a warning is probably not the way to go, as it does imply > that there might have been something wrong with either the input or output. > Plus, not everybody understands the distinction between error and warning. > > And having additional documentation can't possibly hurt. But that > assumes that most/some/all of the end users both peruse and understand the > documentation, which we all know is not the case. The main issue, for me at > least, is that a significant proportion of people seem to think there is > some sort of uniqueness imposed on things like Entrez Gene IDs and Hugo > symbols, etc. While that is the ultimate goal, we do not have and maybe > never will achieve unique IDs for each annotatable object. > > I used to work for a PI who was a very smart, well informed statistical > geneticist who was absolutely shocked when I informed her that a) there are > SNPs in dbSNP that have more than one RS ID, and that b.) there are RS IDs > in dbSNP that have been assigned to multiple SNPs. She just assumed that > there was a one-to-one RS ID -> SNP mapping. > > So this is to me the crux of the problem. It is perfectly valid to > return one-to-many mappings, and that is what should be expected *by > those of us who already understand such things. *But for those of us who > are ignorant of the details, and those who assume uniqueness of IDs, it > would be really nice if they got a message telling them something like > > *Please note that there are one-to-many mappings between the input and > output IDs, so the output is longer than your input vector. Please see > ?select for more detail.* > > And if the message is objectionable to some, you could give the option > for people to set a global flag to shut it off. Something like > > if(!pleaseMakeItStop) > message(<message goes here>) > > and they could set > > pleaseMakeItStop = TRUE in their .Rprofile > > Is that a reasonable compromise? > > Jim > > > > On Thu, Jun 4, 2015 at 6:06 PM, Marc Carlson <mcarl...@fredhutch.org> > wrote: > >> Hi Jim, >> >> I do agree that the warning was protective for that (this is why I put it >> there). >> >> But it was also annoying for many and a source of some confusion because >> when people see a warning() they think that something has gone wrong with >> the code that was just run. And in this case the select method was >> actually doing exactly what it was supposed to be doing. What it was >> actually warning you about was what you did separately in that assignment >> to fit2... Which is the step right after the select method already did >> it's work. And I can understand why that seems a little bit confusing >> since you are basically telling someone to be careful with the data you >> just gave them. >> >> Now I could replace it with a message() I guess, but in cases like this >> where the warning is about something that happens outside of the function >> you are calling, shouldn't that probably be handled by documentation? Or >> at least, that is the argument that finally persuaded me to remove it. >> That and that fact that almost every call to select() ended up accompanied >> by the warning you mentioned, because it turns out that perfect 1:1 >> relationships are pretty rare for annotation data. Very often, you are >> going to get back multiple results. >> >> But I didn't just remove the warning, I also supplied an alternative for >> people who have a real need for consistent 1:1 mapping. >> >> The mapIds() method takes most of the same arguments as select, except >> that unlike select(), it only looks up one column and it always returns a >> vector that is the same size as the vector that came in. >> >> So for your example, you could do something like this psuedocode here: >> >> mapIds(<chippackage>, featureNames(eset), column="ENTREZID", >> keytype="PROBEID") >> >> And mapIds will follow a rule specified by the default value for the >> multiVals argument so that you can get back your results in a 1:1 manner. >> And if you don't like any of the options available for the multiVals >> argument, you can make your own function and pass it in. >> >> >> Anyhow please continue to let us know what you think? >> >> >> Marc >> >> >> >> >> >> >> >> On 06/04/2015 10:50 AM, James W. MacDonald wrote: >> >>> In the last release, the warning message from select() telling people >>> that >>> their results include one-to-many mappings was removed. While some may >>> find >>> this warning annoying, I think silently returning something unexpected to >>> our users is dangerous. >>> >>> In other words, for me it is a common practice to do something like this: >>> >>> fit <- lmFit(eset, design) >>> fit2 <- eBayes(fit) >>> gns <- select(<chippackage>, featureNames(eset), c("ENTREZID","SYMBOL")) >>> gns <- gns[!duplicated(gns[,1]),] >>> fit2$genes <- gns >>> >>> I add in the step where dups are removed because I already know they are >>> there. But a naive user might instead do >>> >>> fit2$genes <- select(<chippackage>, featureNames(eset), >>> c("ENTREZID","SYMBOL")) >>> >>> Which will work just fine, but then all the annotation (except for the >>> first few lines) will now be completely incorrect, and there wasn't a >>> warning to let the end user know that they may have made a mistake. >>> >>> lmFit() will parse the featureData slot of an ExpressionSet and use those >>> data for annotation, so that gives some hypothetical protections, for >>> those >>> who first put their annotation data into their ExpressionSet. However, >>> ?eSet says: >>> >>> ‘featureData’: Contains variables describing features (i.e., rows >>> in ‘assayData’) unique to this experiment. Use the >>> ‘annotation’ slot to efficiently reference feature data >>> common to the annotation package used in the experiment. >>> Class: ‘AnnotatedDataFrame-class’ >>> >>> Which to me indicates that the featureData slot isn't really intended to >>> contain annotation data, but instead some unique information that >>> pertains >>> to a given experiment. But maybe I misunderstand. >>> >>> Is the featureData slot actually intended for annotation data? If not, >>> what >>> is the intended pipeline for annotating data in an ExpressionSet? Am I >>> alone in being concerned about this? >>> >>> Best, >>> >>> Jim >>> >>> >>> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel