Hi, On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen <thomas...@gmail.com> wrote: > With regards to abstraction - I would personally much rather read and write > code that contained plotScores() and plotScree() etc. where the intend of the > code is clearly communicated, instead of relying on a plot() function whose > result is only known from experience. Trying to squeeze every kind of visual > output into the same plot generic seems artificial and constrained to me. I > totally agree on the plotPCA critique on the other hand...
If we've bought a ticket to ride on Kevin's and Michael's (and whoever else) train of thought, wouldn't plot(pca(x), type='scree') or plot(pca(x), type='scores') be the preferred way to go ... for some definition of "preferable"? -steve > > Thomas > > >> On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> wrote: >> >> I strongly agree with Kevin's position. plotPCA() represents two separate >> concerns in its very name: the computation and the rendering. Those need to >> be separated, at least behind the scenes. The syntax of plot(pca(x)) is >> preferable to plotPCA, because the structure of the operation is represented >> by in the expression itself, not just in a non-computable function name. >> >> With regard to how a plot,PCA should behave: there is always a tension >> between high-level and low-level APIs. In the end, we need multiple levels >> of abstraction. While high-level APIs sacrifice flexibility, we need them >> because they communicate the high-level *intent* of the user in the code >> itself (self-documenting code), and they enable reusability, which not only >> reduces redudant effort but also ensures consistency. Once our brains no >> longer need to parse low-level code, we can focus our mental power on >> correctness and efficiency. To design a high-level API, one needs to >> carefully analyze user requirements, i.e., the use cases. To choose the >> default behavior, one needs to rate the use cases by their prevalance, and >> by how closely they match the intuition-based expectations of the user. >> >> The fact that at least 9 packages are performing such a similar task seems >> to indicate that a common abstraction is warranted, but I am not sure if >> BiocGenerics is the appropriate place. >> >> Michael >> >> On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen >> <thomas...@gmail.com <mailto:thomas...@gmail.com>> wrote: >> While I tend to agree with you that PCA is too big an operation to be hidden >> within a plotting function (MDS is an edge-case I would say), I can't see >> how we can ever reach a point where there is only one generic plot function. >> In the case of PCA there is a number of different plot-types that can all >> lay claim to the plot function of a PCA class, for instance scoreplot, >> scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 >> barplot, leverage vs. distance-to-model... (you get the idea). So while >> having some very well-thought out classes for very common result types such >> as PCA, this class would still need a lot of different plot methods such as >> plotScores, plotScree etc (or plot(..., type='score'), but I don't find that >> very appealing). Expanding beyond PCA only muddles the water even more - >> there are very few interesting data structures that only have one visual >> representation to-rule-them-all... >> >> just my 2c >> >> best >> Thomas >> >> >> > Date: Mon, 20 Oct 2014 18:50:48 -0400 >> > From: Kevin Coombes <kevin.r.coom...@gmail.com >> > <mailto:kevin.r.coom...@gmail.com>> >> > >> > Well. I have two responses to that. >> > >> > First, I think it would be a lot better/easier for users if (most) >> > developers could make use of the same plot function for "basic" classes >> > like PCA. >> > >> > Second, if you think the basic PCA plotting routine needs enhancements, >> > you still have two options. On the one hand, you could (as you said) >> > try to convince the maintainer of PCA to add what you want. If it's >> > generally valuable, then he'd probably do it --- and other classes that >> > use it would benefit. On the other hand, if it really is a special >> > enhancement that only makes sense for your class, then you can derive a >> > class from the basic PCA class >> > setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*) >> > and implement your own version of the "plot" generic for this class. >> > And you could tweak the "as.PCA" function so it returns an object of the >> > mySpecialPCA class. And the user could still just "plot" the result >> > without hacving to care what's happening behind the scenes. >> > >> > On 10/20/2014 5:59 PM, Michael Love wrote: >> >> Ah, I see now. Personally, I don't think Bioconductor developers >> >> should have to agree on single plotting functions for basic classes >> >> like 'PCA' (because this logic applies equally to the situation of all >> >> Bioconductor developers agreeing on single MA-plot, a single >> >> variance-mean plot, etc). I think letting developers define their >> >> plotPCA makes contributions easier (I don't have to ask the owner of >> >> plot.PCA to incorporate something), even though it means we have a >> >> growing list of generics. >> >> >> >> Still you have a good point about splitting computation and plotting. >> >> In practice, we subset the rows so PCA is not laborious. >> >> >> >> >> >> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes >> >> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> >> >> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> >> >> wrote: >> >> >> >> Hi, >> >> >> >> I don't see how it needs more functions (as long as you can get >> >> developers to agree). Suppose that someone can define a reusable >> >> PCA class. This will contain a single "plot" generic function, >> >> defined once and reused by other classes. The existing "plotPCA" >> >> interface can also be implemented just once, in this class, as >> >> >> >> plotPCA <- function(object, ...) plot(as.PCA(object), ...) >> >> >> >> This can be exposed to users of your class through namespaces. >> >> Then the only thing a developer needs to implement in his own >> >> class is the single "as.PCA" function. And he/she would have >> >> already been rquired to implement this as part of the old >> >> "plotPCA" function. So it can be extracted from that, and the >> >> developer doesn't have to reimplement the visualization code from >> >> the PCA class. >> >> >> >> Best, >> >> Kevin >> >> >> >> >> >> On 10/20/2014 5:15 PM, davide risso wrote: >> >>> Hi Kevin, >> >>> >> >>> I see your points and I agree (especially for the specific case >> >>> of plotPCA that involves some non trivial computations). >> >>> >> >>> On the other hand, having a wrapper function that starting from >> >>> the "raw" data gives you a pretty picture (with virtually zero >> >>> effort by the user) using a sensible choice of parameters that >> >>> are more or less OK for RNA-seq data is useful for practitioners >> >>> that just want to look for patterns in the data. >> >>> >> >>> I guess it would be the same to have a PCA method for each of the >> >>> objects and then using the plot method on those new objects, but >> >>> that would just create a lot more objects and functions than the >> >>> current approach (like Mike was saying). >> >>> >> >>> Your "as.pca" or "performPCA" approach would be definitely better >> >>> if all the different methods would create objects of the *same* >> >>> PCA class, but since we are talking about different packages, I >> >>> don't know how easy it would be to coordinate. But perhaps this >> >>> is the way we should go. >> >>> >> >>> Best, >> >>> davide >> >>> >> >>> >> >>> >> >>> On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes >> >>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> >> >>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> >> >>> wrote: >> >>> >> >>> Hi, >> >>> >> >>> It depends. >> >>> >> >>> The "traditional" R approach to these matters is that you (a) >> >>> first perform some sort of an analysis and save the results >> >>> as an object and then (b) show or plot what you got. It is >> >>> part (b) that tends to be really generic, and (in my opinion) >> >>> should have really generic names -- like "show" or "plot" or >> >>> "hist" or "image". >> >>> >> >>> With PCA in particular, you usually have to perform a bunch >> >>> of computations in order to get the principal components from >> >>> some part of the data. As I understand it now, these >> >>> computations are performed along the way as part of the >> >>> various "plotPCA" functions. The "R way" to do this would be >> >>> something like >> >>> pca <- performPCA(mySpecialObject) # or >> >>> as.PCA(mySpecialObject) >> >>> plot(pca) # to get the scatter plot >> >>> This apporach has the user-friendly advantage that you can >> >>> tweak the plot (in terms of colors, symbols, ranges, titles, >> >>> etc) without having to recompute the principal components >> >>> every time. (I often find myself re-plotting the same PCA >> >>> several times, with different colors or symbols for different >> >>> factrors associated with the samples.) In addition, you could >> >>> then also do something like >> >>> screeplot(pca) >> >>> to get a plot of the percentages of variance explained. >> >>> >> >>> My own feeling is that if the object doesn't know what to do >> >>> when you tell it to "plot" itself, then you haven't got the >> >>> right abstraction. >> >>> >> >>> You may still end up needing generics for each kind of >> >>> computation you want to perform (PCA, RLE, MA, etc), which is >> >>> why I suggested an "as.PCA" function. After all, "as" is >> >>> already pretty generic. In the long run, l this would herlp >> >>> BioConductor developers, since they wouldn't all have to >> >>> reimplement the visualization code; they would just have to >> >>> figure out how to convert their own object into a PCA or RLE >> >>> or MA object. >> >>> >> >>> And I know that this "plotWhatever" approach is used >> >>> elsewhere in BioConductor, and it has always bothered me. It >> >>> just seemed that a post suggesting a new generic function >> >>> provided a reasonable opportunity to point out that there >> >>> might be a better way. >> >>> >> >>> Best, >> >>> Kevin >> >>> >> >>> PS: My own "ClassDicsovery" package, which is available from >> >>> RForge via >> >>> **|install.packages("ClassDiscovery", >> >>> repos="http://R-Forge.R-project.org >> >>> <http://r-forge.r-project.org/>" >> >>> <http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|** >> >>> includes a "SamplePCA" class that does something roughly >> >>> similar to this for microarrays. >> >>> >> >>> PPS (off-topic): The worst offender in base R -- because it >> >>> doesn't use this "typical" approch -- is the "heatmap" >> >>> function. Having tried to teach this function in several >> >>> different classes, I have come to the conclusion that it is >> >>> basically unusable by mortals. And I think the problem is >> >>> that it tries to combine too many steps -- clustering rows, >> >>> clustering columns, scaling, visualization -- all in a single >> >>> fiunction >> >>> >> >>> >> >>> On 10/20/2014 3:47 PM, davide risso wrote: >> >>>> Hi Kevin, >> >>>> >> >>>> I don't agree. In the case of EDASeq (as I suppose it is the >> >>>> case for DESeq/DESeq2) plotting the principal components of >> >>>> the count matrix is only one of possible exploratory plots >> >>>> (RLE plots, MA plots, etc.). >> >>>> So, in my opinion, it makes more sense from an object >> >>>> oriented point of view to have multiple plotting methods for >> >>>> a single "RNA-seq experiment" object. >> >>>> >> >>>> In addition, this is the same strategy adopted elsewhere in >> >>>> Bioconductor, e.g., for the plotMA method. >> >>>> >> >>>> Just my two cents. >> >>>> >> >>>> Best, >> >>>> davide >> >>>> >> >>>> On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes >> >>>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> >> >>>> <mailto:kevin.r.coom...@gmail.com >> >>>> <mailto:kevin.r.coom...@gmail.com>>> wrote: >> >>>> >> >>>> I understand that breaking code is a problem, and that >> >>>> is admittedly the main reason not to immediately adopt >> >>>> my suggestion. >> >>>> >> >>>> But as a purely logical exercise, creating a "PCA" >> >>>> object X or something similar and using either >> >>>> plot(X) >> >>>> or >> >>>> plot(as.PCA(mySpecialObject)) >> >>>> is a much more sensible use of object-oriented >> >>>> programming/design. This requires no new generics (to >> >>>> write or to learn). >> >>>> >> >>>> And you could use it to transition away from the current >> >>>> system by convincing the various package maintainers to >> >>>> re-implement plotPCA as follows: >> >>>> >> >>>> plotPCA <- function(object, ...) { >> >>>> plot(as.PCA(object), ...) >> >>>> } >> >>>> >> >>>> This would be relatively easy to eventually deprecate >> >>>> and teach users to switch to the alternative. >> >>>> >> >>>> >> >>>> On 10/20/2014 1:07 PM, Michael Love wrote: >> >>>>> hi Kevin, >> >>>>> >> >>>>> that would imply there is only one way to plot an >> >>>>> object of a given class. Additionally, it would break a >> >>>>> lot of code.? >> >>>>> >> >>>>> best, >> >>>>> >> >>>>> Mike >> >>>>> >> >>>>> On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes >> >>>>> <kevin.r.coom...@gmail.com >> >>>>> <mailto:kevin.r.coom...@gmail.com> >> >>>>> <mailto:kevin.r.coom...@gmail.com >> >>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote: >> >>>>> >> >>>>> But shouldn't they all really just be named "plot" >> >>>>> for the appropriate objects? In which case, there >> >>>>> would already be a perfectly good generic.... >> >>>>> >> >>>>> On Oct 20, 2014 10:27 AM, "Michael Love" >> >>>>> <michaelisaiahl...@gmail.com >> >>>>> <mailto:michaelisaiahl...@gmail.com> >> >>>>> <mailto:michaelisaiahl...@gmail.com >> >>>>> <mailto:michaelisaiahl...@gmail.com>>> wrote: >> >>>>> >> >>>>> I noticed that 'plotPCA' functions are defined >> >>>>> in EDASeq, DESeq2, DESeq, >> >>>>> affycoretools, Rcade, facopy, CopyNumber450k, >> >>>>> netresponse, MAIT (maybe >> >>>>> more). >> >>>>> >> >>>>> Sounds like a case for BiocGenerics. >> >>>>> >> >>>>> best, >> >>>>> >> >>>>> Mike >> >>>>> >> >>>>> [[alternative HTML version deleted]] >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Bioc-devel@r-project.org >> >>>>> <mailto:Bioc-devel@r-project.org> >> >>>>> <mailto:Bioc-devel@r-project.org >> >>>>> <mailto:Bioc-devel@r-project.org>> mailing list >> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> ------------------------------------------------------------------------ >> >>>> <http://www.avast.com/ <http://www.avast.com/>> >> >>>> >> >>>> This email is free from viruses and malware because >> >>>> avast! Antivirus <http://www.avast.com/ >> >>>> <http://www.avast.com/>> protection is >> >>>> active. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Davide Risso, PhD >> >>>> Post Doctoral Scholar >> >>>> Division of Biostatistics >> >>>> School of Public Health >> >>>> University of California, Berkeley >> >>>> 344 Li Ka Shing Center, #3370 >> >>>> Berkeley, CA 94720-3370 >> >>>> E-mail: davide.ri...@berkeley.edu >> >>>> <mailto:davide.ri...@berkeley.edu> >> >>>> <mailto:davide.ri...@berkeley.edu >> >>>> <mailto:davide.ri...@berkeley.edu>> >> >>> >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------ >> >>> <http://www.avast.com/ <http://www.avast.com/>> >> >>> >> >>> This email is free from viruses and malware because avast! >> >>> Antivirus <http://www.avast.com/ <http://www.avast.com/>> >> >>> protection is active. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Davide Risso, PhD >> >>> Post Doctoral Scholar >> >>> Division of Biostatistics >> >>> School of Public Health >> >>> University of California, Berkeley >> >>> 344 Li Ka Shing Center, #3370 >> >>> Berkeley, CA 94720-3370 >> >>> E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> >> >>> <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> <http://www.avast.com/ <http://www.avast.com/>> >> >> >> >> This email is free from viruses and malware because avast! >> >> Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection >> >> is active. >> >> >> >> >> >> >> > >> > >> > >> > --- >> > This email is free from viruses and malware because avast! Antivirus >> > protection is active. >> > >> > >> > [[alternative HTML version deleted]] >> > >> > >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Bioc-devel mailing list >> > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > >> > >> > End of Bioc-devel Digest, Vol 127, Issue 43 >> > ******************************************* >> >> _______________________________________________ >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Steve Lianoglou Computational Biologist Genentech _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel