Re: [Bioc-devel] plotPCA for BiocGenerics

Steve Lianoglou Fri, 31 Oct 2014 15:44:14 -0700

Hi,

On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
<thomas...@gmail.com> wrote:
> With regards to abstraction - I would personally much rather read and write 
> code that contained plotScores() and plotScree() etc. where the intend of the 
> code is clearly communicated, instead of relying on a plot() function whose 
> result is only known from experience. Trying to squeeze every kind of visual 
> output into the same plot generic seems artificial and constrained to me. I 
> totally agree on the plotPCA critique on the other hand...


If we've bought a ticket to ride on Kevin's and Michael's (and whoever
else) train of thought, wouldn't plot(pca(x), type='scree') or
plot(pca(x), type='scores') be the preferred way to go ... for some
definition of "preferable"?

-steve

>
> Thomas
>
>
>> On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> wrote:
>>
>> I strongly agree with Kevin's position. plotPCA() represents two separate 
>> concerns in its very name: the computation and the rendering. Those need to 
>> be separated, at least behind the scenes. The syntax of plot(pca(x)) is 
>> preferable to plotPCA, because the structure of the operation is represented 
>> by in the expression itself, not just in a non-computable function name.
>>
>> With regard to how a plot,PCA should behave: there is always a tension 
>> between high-level and low-level APIs. In the end, we need multiple levels 
>> of abstraction.  While high-level APIs sacrifice flexibility, we need them 
>> because they communicate the high-level *intent* of the user in the code 
>> itself (self-documenting code), and they enable reusability, which not only 
>> reduces redudant effort but also ensures consistency. Once our brains no 
>> longer need to parse low-level code, we can focus our mental power on 
>> correctness and efficiency. To design a high-level API, one needs to 
>> carefully analyze user requirements, i.e., the use cases. To choose the 
>> default behavior, one needs to rate the use cases by their prevalance, and 
>> by how closely they match the intuition-based expectations of the user.
>>
>> The fact that at least 9 packages are performing such a similar task seems 
>> to indicate that a common abstraction is warranted, but I am not sure if 
>> BiocGenerics is the appropriate place.
>>
>> Michael
>>
>> On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen 
>> <thomas...@gmail.com <mailto:thomas...@gmail.com>> wrote:
>> While I tend to agree with you that PCA is too big an operation to be hidden 
>> within a plotting function (MDS is an edge-case I would say), I can't see 
>> how we can ever reach a point where there is only one generic plot function. 
>> In the case of PCA there is a number of different plot-types that can all 
>> lay claim to the plot function of a PCA class, for instance scoreplot, 
>> scatterplot matrix of all scores, biplot, screeplot, accumulated R^2 
>> barplot, leverage vs. distance-to-model... (you get the idea). So while 
>> having some very well-thought out classes for very common result types such 
>> as PCA, this class would still need a lot of different plot methods such as 
>> plotScores, plotScree etc (or plot(..., type='score'), but I don't find that 
>> very appealing). Expanding beyond PCA only muddles the water even more - 
>> there are very few interesting data structures that only have one visual 
>> representation to-rule-them-all...
>>
>> just my 2c
>>
>> best
>> Thomas
>>
>>
>> > Date: Mon, 20 Oct 2014 18:50:48 -0400
>> > From: Kevin Coombes <kevin.r.coom...@gmail.com 
>> > <mailto:kevin.r.coom...@gmail.com>>
>> >
>> > Well. I have two responses to that.
>> >
>> > First, I think it would be a lot better/easier for users if (most)
>> > developers could make use of the same plot function for "basic" classes
>> > like PCA.
>> >
>> > Second, if you think the basic PCA plotting routine needs enhancements,
>> > you still have two options.  On the one hand, you could (as you said)
>> > try to convince the maintainer of PCA to add what you want.  If it's
>> > generally valuable, then he'd probably do it --- and other classes that
>> > use it would benefit.  On the other hand, if it really is a special
>> > enhancement that only makes sense for your class, then you can derive a
>> > class from the basic PCA class
>> >     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
>> >  and implement your own version of the "plot" generic for this class.
>> > And you could tweak the "as.PCA" function so it returns an object of the
>> > mySpecialPCA class. And the user could still just "plot" the result
>> > without hacving to care what's happening behind the scenes.
>> >
>> > On 10/20/2014 5:59 PM, Michael Love wrote:
>> >> Ah, I see now. Personally, I don't think Bioconductor developers
>> >> should have to agree on single plotting functions for basic classes
>> >> like 'PCA' (because this logic applies equally to the situation of all
>> >> Bioconductor developers agreeing on single MA-plot, a single
>> >> variance-mean plot, etc). I think letting developers define their
>> >> plotPCA makes contributions easier (I don't have to ask the owner of
>> >> plot.PCA to incorporate something), even though it means we have a
>> >> growing list of generics.
>> >>
>> >> Still you have a good point about splitting computation and plotting.
>> >> In practice, we subset the rows so PCA is not laborious.
>> >>
>> >>
>> >> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
>> >> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
>> >> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
>> >> wrote:
>> >>
>> >>    Hi,
>> >>
>> >>    I don't see how it needs more functions (as long as you can get
>> >>    developers to agree).  Suppose that someone can define a reusable
>> >>    PCA class.  This will contain a single "plot" generic function,
>> >>    defined once and reused by other classes. The existing "plotPCA"
>> >>    interface can also be implemented just once, in this class, as
>> >>
>> >>        plotPCA <- function(object, ...) plot(as.PCA(object), ...)
>> >>
>> >>    This can be exposed to users of your class through namespaces.
>> >>    Then the only thing a developer needs to implement in his own
>> >>    class is the single "as.PCA" function.  And he/she would have
>> >>    already been rquired to implement this as part of the old
>> >>    "plotPCA" function.  So it can be extracted from that, and the
>> >>    developer doesn't have to reimplement the visualization code from
>> >>    the PCA class.
>> >>
>> >>    Best,
>> >>      Kevin
>> >>
>> >>
>> >>    On 10/20/2014 5:15 PM, davide risso wrote:
>> >>>    Hi Kevin,
>> >>>
>> >>>    I see your points and I agree (especially for the specific case
>> >>>    of plotPCA that involves some non trivial computations).
>> >>>
>> >>>    On the other hand, having a wrapper function that starting from
>> >>>    the "raw" data gives you a pretty picture (with virtually zero
>> >>>    effort by the user) using a sensible choice of parameters that
>> >>>    are more or less OK for RNA-seq data is useful for practitioners
>> >>>    that just want to look for patterns in the data.
>> >>>
>> >>>    I guess it would be the same to have a PCA method for each of the
>> >>>    objects and then using the plot method on those new objects, but
>> >>>    that would just create a lot more objects and functions than the
>> >>>    current approach (like Mike was saying).
>> >>>
>> >>>    Your "as.pca" or "performPCA" approach would be definitely better
>> >>>    if all the different methods would create objects of the *same*
>> >>>    PCA class, but since we are talking about different packages, I
>> >>>    don't know how easy it would be to coordinate. But perhaps this
>> >>>    is the way we should go.
>> >>>
>> >>>    Best,
>> >>>    davide
>> >>>
>> >>>
>> >>>
>> >>>    On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
>> >>>    <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
>> >>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
>> >>> wrote:
>> >>>
>> >>>        Hi,
>> >>>
>> >>>        It depends.
>> >>>
>> >>>        The "traditional" R approach to these matters is that you (a)
>> >>>        first perform some sort of an analysis and save the results
>> >>>        as an object and then (b) show or plot what you got.  It is
>> >>>        part (b) that tends to be really generic, and (in my opinion)
>> >>>        should have really generic names -- like "show" or "plot" or
>> >>>        "hist" or "image".
>> >>>
>> >>>        With PCA in particular, you usually have to perform a bunch
>> >>>        of computations in order to get the principal components from
>> >>>        some part of the data.  As I understand it now, these
>> >>>        computations are performed along the way as part of the
>> >>>        various "plotPCA" functions.  The "R way" to do this would be
>> >>>        something like
>> >>>            pca <- performPCA(mySpecialObject)  # or
>> >>>        as.PCA(mySpecialObject)
>> >>>            plot(pca) # to get the scatter plot
>> >>>        This apporach has the user-friendly advantage that you can
>> >>>        tweak the plot (in terms of colors, symbols, ranges, titles,
>> >>>        etc) without having to recompute the principal components
>> >>>        every time. (I often find myself re-plotting the same PCA
>> >>>        several times, with different colors or symbols for different
>> >>>        factrors associated with the samples.) In addition, you could
>> >>>        then also do something like
>> >>>            screeplot(pca)
>> >>>        to get a plot of the percentages of variance explained.
>> >>>
>> >>>        My own feeling is that if the object doesn't know what to do
>> >>>        when you tell it to "plot" itself, then you haven't got the
>> >>>        right abstraction.
>> >>>
>> >>>        You may still end up needing generics for each kind of
>> >>>        computation you want to perform (PCA, RLE, MA, etc), which is
>> >>>        why I suggested an "as.PCA" function.  After all, "as" is
>> >>>        already pretty generic.  In the long run, l this would herlp
>> >>>        BioConductor developers, since they wouldn't all have to
>> >>>        reimplement the visualization code; they would just have to
>> >>>        figure out how to convert their own object into a PCA or RLE
>> >>>        or MA object.
>> >>>
>> >>>        And I know that this "plotWhatever" approach is used
>> >>>        elsewhere in BioConductor, and it has always bothered me. It
>> >>>        just seemed that a post suggesting a new generic function
>> >>>        provided a reasonable opportunity to point out that there
>> >>>        might be a better way.
>> >>>
>> >>>        Best,
>> >>>          Kevin
>> >>>
>> >>>        PS: My own "ClassDicsovery" package, which is available from
>> >>>        RForge via
>> >>>        **|install.packages("ClassDiscovery",
>> >>>        repos="http://R-Forge.R-project.org 
>> >>> <http://r-forge.r-project.org/>"
>> >>>        <http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|**
>> >>>        includes a "SamplePCA" class that does something roughly
>> >>>        similar to this for microarrays.
>> >>>
>> >>>        PPS (off-topic): The worst offender in base R -- because it
>> >>>        doesn't use this "typical" approch -- is the "heatmap"
>> >>>        function.  Having tried to teach this function in several
>> >>>        different classes, I have come to the conclusion that it is
>> >>>        basically unusable by mortals.  And I think the problem is
>> >>>        that it tries to combine too many steps -- clustering rows,
>> >>>        clustering columns, scaling, visualization -- all in a single
>> >>>        fiunction
>> >>>
>> >>>
>> >>>        On 10/20/2014 3:47 PM, davide risso wrote:
>> >>>>        Hi Kevin,
>> >>>>
>> >>>>        I don't agree. In the case of EDASeq (as I suppose it is the
>> >>>>        case for DESeq/DESeq2) plotting the principal components of
>> >>>>        the count matrix is only one of possible exploratory plots
>> >>>>        (RLE plots, MA plots, etc.).
>> >>>>        So, in my opinion, it makes more sense from an object
>> >>>>        oriented point of view to have multiple plotting methods for
>> >>>>        a single "RNA-seq experiment" object.
>> >>>>
>> >>>>        In addition, this is the same strategy adopted elsewhere in
>> >>>>        Bioconductor, e.g., for the plotMA method.
>> >>>>
>> >>>>        Just my two cents.
>> >>>>
>> >>>>        Best,
>> >>>>        davide
>> >>>>
>> >>>>        On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
>> >>>>        <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
>> >>>>        <mailto:kevin.r.coom...@gmail.com 
>> >>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
>> >>>>
>> >>>>            I understand that breaking code is a problem, and that
>> >>>>            is admittedly the main reason not to immediately adopt
>> >>>>            my suggestion.
>> >>>>
>> >>>>            But as a purely logical exercise, creating a "PCA"
>> >>>>            object X or something similar and using either
>> >>>>                plot(X)
>> >>>>            or
>> >>>>            plot(as.PCA(mySpecialObject))
>> >>>>            is a much more sensible use of object-oriented
>> >>>>            programming/design. This requires no new generics (to
>> >>>>            write or to learn).
>> >>>>
>> >>>>            And you could use it to transition away from the current
>> >>>>            system by convincing the various package maintainers to
>> >>>>            re-implement plotPCA as follows:
>> >>>>
>> >>>>            plotPCA <- function(object, ...) {
>> >>>>              plot(as.PCA(object), ...)
>> >>>>            }
>> >>>>
>> >>>>            This would be relatively easy to eventually deprecate
>> >>>>            and teach users to switch to the alternative.
>> >>>>
>> >>>>
>> >>>>            On 10/20/2014 1:07 PM, Michael Love wrote:
>> >>>>>            hi Kevin,
>> >>>>>
>> >>>>>            that would imply there is only one way to plot an
>> >>>>>            object of a given class. Additionally, it would break a
>> >>>>>            lot of code.?
>> >>>>>
>> >>>>>            best,
>> >>>>>
>> >>>>>            Mike
>> >>>>>
>> >>>>>            On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>> >>>>>            <kevin.r.coom...@gmail.com 
>> >>>>> <mailto:kevin.r.coom...@gmail.com>
>> >>>>>            <mailto:kevin.r.coom...@gmail.com 
>> >>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
>> >>>>>
>> >>>>>                But shouldn't they all really just be named "plot"
>> >>>>>                for the appropriate objects?  In which case, there
>> >>>>>                would already be a perfectly good generic....
>> >>>>>
>> >>>>>                On Oct 20, 2014 10:27 AM, "Michael Love"
>> >>>>>                <michaelisaiahl...@gmail.com 
>> >>>>> <mailto:michaelisaiahl...@gmail.com>
>> >>>>>                <mailto:michaelisaiahl...@gmail.com 
>> >>>>> <mailto:michaelisaiahl...@gmail.com>>> wrote:
>> >>>>>
>> >>>>>                    I noticed that 'plotPCA' functions are defined
>> >>>>>                    in EDASeq, DESeq2, DESeq,
>> >>>>>                    affycoretools, Rcade, facopy, CopyNumber450k,
>> >>>>>                    netresponse, MAIT (maybe
>> >>>>>                    more).
>> >>>>>
>> >>>>>                    Sounds like a case for BiocGenerics.
>> >>>>>
>> >>>>>                    best,
>> >>>>>
>> >>>>>                    Mike
>> >>>>>
>> >>>>>                    [[alternative HTML version deleted]]
>> >>>>>
>> >>>>>                    _______________________________________________
>> >>>>>                    Bioc-devel@r-project.org 
>> >>>>> <mailto:Bioc-devel@r-project.org>
>> >>>>>                    <mailto:Bioc-devel@r-project.org 
>> >>>>> <mailto:Bioc-devel@r-project.org>> mailing list
>> >>>>>                    https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>> >>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>            
>> >>>> ------------------------------------------------------------------------
>> >>>>            <http://www.avast.com/ <http://www.avast.com/>>
>> >>>>
>> >>>>            This email is free from viruses and malware because
>> >>>>            avast! Antivirus <http://www.avast.com/ 
>> >>>> <http://www.avast.com/>> protection is
>> >>>>            active.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>        --
>> >>>>        Davide Risso, PhD
>> >>>>        Post Doctoral Scholar
>> >>>>        Division of Biostatistics
>> >>>>        School of Public Health
>> >>>>        University of California, Berkeley
>> >>>>        344 Li Ka Shing Center, #3370
>> >>>>        Berkeley, CA 94720-3370
>> >>>>        E-mail: davide.ri...@berkeley.edu 
>> >>>> <mailto:davide.ri...@berkeley.edu>
>> >>>>        <mailto:davide.ri...@berkeley.edu 
>> >>>> <mailto:davide.ri...@berkeley.edu>>
>> >>>
>> >>>
>> >>>
>> >>>        
>> >>> ------------------------------------------------------------------------
>> >>>        <http://www.avast.com/ <http://www.avast.com/>>
>> >>>
>> >>>        This email is free from viruses and malware because avast!
>> >>>        Antivirus <http://www.avast.com/ <http://www.avast.com/>> 
>> >>> protection is active.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>    --
>> >>>    Davide Risso, PhD
>> >>>    Post Doctoral Scholar
>> >>>    Division of Biostatistics
>> >>>    School of Public Health
>> >>>    University of California, Berkeley
>> >>>    344 Li Ka Shing Center, #3370
>> >>>    Berkeley, CA 94720-3370
>> >>>    E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> 
>> >>> <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>
>> >>
>> >>
>> >>
>> >>    
>> >> ------------------------------------------------------------------------
>> >>    <http://www.avast.com/ <http://www.avast.com/>>
>> >>
>> >>    This email is free from viruses and malware because avast!
>> >>    Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection 
>> >> is active.
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > ---
>> > This email is free from viruses and malware because avast! Antivirus 
>> > protection is active.
>> >
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > Bioc-devel mailing list
>> > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>> > <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>> >
>> >
>> > End of Bioc-devel Digest, Vol 127, Issue 43
>> > *******************************************
>>
>> _______________________________________________
>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Steve Lianoglou
Computational Biologist
Genentech

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] plotPCA for BiocGenerics

Reply via email to