Thanks for those additional comments, Levi. I don't think you were being unfair to phyloseq, and it sounds like some of the issues involved are still relevant (e.g. new domain, new contributor).
I wanted to riff on Martin's comment about "incremental gain rather than perfection". I imagine there are many reasons to use this as a guiding principle, but one of them could be that it helps foster new-to-BioC contributors by not overwhelming them with sub-critical requirements. Interoperability is a feature that developers are already incentivized to leverage, when they realize it, if they haven't buried themselves too deep O:-). Similarly, after taking a peek at those developer docs, I think buildingPackagesForBioC <http://bioconductor.org/developers/how-to/buildingPackagesForBioc/> looks pretty great, and I wish it had existed (or I had known about it) when I unwittingly blundered these early decisions in phyloseq. Thanks again for the discussion/feedback! --- --- "Joey" Paul J. McMurdie II Sent from Gmail On Thu, Oct 19, 2017 at 9:53 AM, Martin Morgan < martin.mor...@roswellpark.org> wrote: > On 10/19/2017 10:12 AM, Levi Waldron wrote: > >> Thanks for all your thoughts Joey, and I hope I didn't come across as >> critical of phyloseq in particular. In fact, the couple packages I created >> as a post-doc (doppelgangR and ffpe) did exactly the same thing, but >> unlike >> phyloseq have never been used enough for it to make much difference :). >> This comparison with phyloseq and metagenomeSeq is just one that I have >> personal experience with because I use both packages and it was something >> of a revelation to me when I realized that MRExperiment and all other >> eSet-derived classes would "just work" as elements of a >> MultiAssayExperiment. My motivation in making these slides was mainly to >> share my own very recent revelations and try to make life better for new >> package developers and their users. A couple comments below: >> >> >> On Wed, Oct 18, 2017 at 11:36 PM, Paul Joseph McMurdie <joey...@gmail.com >> > >> wrote: >> >> >>> - Hopefully it is obvious from my description, and also what I imagine to >>> be Levi's motivation for making the slide deck, but somehow new eager >>> developers are missing out on this great infrastructure and it isn't >>> because they want to re-implement core stuff. I sure didn't! I simply >>> didn't know what was there or best practices for BioC. *A "beginner's >>> guide to BioC package development"* would have been at the top of my list >>> of things to read back then. >>> >>> >> I think this is still not abundantly clear from >> http://bioconductor.org/developers/ and I'm not sure how much it factors >> into the package review process. Would it be helpful to have a short >> questionnaire for creators of new packages that includes some things not >> part of R CMD BiocCheck, or CHECK, like - what new classes do you define, >> and what is the purpose of defining them? >> > > [ should have mentioned that the slack channel is #tree-like-se ] > > A couple of quick comments on new packages / reviews. > > New package submissions have a checklist that people submit with their > package. Since it is not uncommon to find checked items that have not been > satisfied, I'm quite cynical about this as a way of enforcing good > behavior, though maybe it does something to elevate awareness. > > One of the check-list items is acknowledge reading the package guidelines > > https://bioconductor.org/developers/package-guidelines/ > > which include an S4 section > > https://bioconductor.org/developers/package-guidelines/#classes > > and links to general guidance > > > https://bioconductor.org/developers/how-to/efficient-code/# > re-use-existing-functionality > > as well as emphasis on core containers > > https://bioconductor.org/developers/how-to/commonMethodsAndClasses/ > > A relatively humorous pattern is for newer team members to eventually > lament that new packages don't re-use existing classes and methods, > suggesting that more documentation is the solution; usually the litany > above is cited, and the newer team member either tweaks or adds to the > existing documentation to the point that they think it is surely > satisfactory, or sees that more documentation is not an effective answer. > This pattern started again yesterday... > > One thing that Joey said, and that I think is true of a lot of new package > contributors, is that they are 'new' developers. This is somewhat ironic, > since they are writing packages that are meant (and sometimes are, as with > phyloseq) to be widely used and influential; it seems somehow appropriate > to keep that channel open, with the domain and other expertise that 'new' > developers bring as a trade-off against the sophistication of their current > development skills. Of course being new developers mean that they make both > 'mistakes' and choices that are not entirely consistent with the overall > project, e.g., use of S3 or R6 rather than S4 classes, reimplementation of > existing functionality, adopting unfamiliar API conventions (a minor > favorite of mine is the presentation of a novel method for some aspect of > differential expression, with the primary input a matrix of samples x > 'genes' rather than the bioinformatics convention of 'genes' x samples). > The diversity of programming and analysis approaches available in R makes > it not at all surprising that the insights that can come even after many > years of working with Bioconductor are not accessible to many new package > contributors. > > I'll also mention my own philosophy (which I sometimes emphasize with > other members of the core team of reviewers) in reviewing packages, which > is to seek incremental gain rather than perfection; sometimes there are > issues other than S4 class use (!) where greater strides can be made. > > Martin > > >> >> - It isn't that I didn't read other established packages. I did. However, >>> a lot of core BioC tools had gene-expression specific names even for data >>> classes that were not intrinsically gene expression (e.g. it's actually a >>> matrix, or related tables) -- and I'm happy many of these now use more >>> general names like "experiment" or "row". The old names signaled to me >>> "this isn't for you". And I naively, ignorantly, accepted that at mostly >>> face value. >>> >>> - Conversely, sometimes not-inheriting methods is a feature, because it >>> protects users from doing something that is great and appropriate for one >>> domain (gene expression) but totally irrational in another (microbiome). >>> I'm not saying my original implementation made great nuanced decisions >>> about this -- it has many trappings of a new developer -- but I did have >>> some pretty naive users in mind with phyloseq, for whom navigating legacy >>> methods and method names from other domain(s) was expected to be a >>> hurdle. >>> Curious to hear thoughts on this. >>> >>> >> Something you did well in phyloseq was define methods for all common user >> operations, and if you had inherited from eSet, you probably still >> would've >> wanted to do this - except that more of your methods would've been just >> wrappers for inappropriately named eSet methods, and your average user >> wouldn't have known the difference. Because of your use of methods and not >> direct slot access by users, I think you could still change the underlying >> data model without it being noticeable to basic users, even if it expanded >> the number of methods actually available. Thankfully, SummarizedExperiment >> I think now avoids these "this isn't for you" signals. >> >> >> - There actually *still isn't core support for evolutionary trees in >>> BioC* (as >>> mentioned by Joe Paulson and Ben Callahan in other threads). One of >>> phyloseq's key contributions was to leverage the fantastic representation >>> of trees implemented in the CRAN package "ape" in order to support >>> analysis >>> techniques popular among microbiome researchers that require a >>> phylogenetic >>> tree. The integration in the phyloseq-class and ape is necessarily pretty >>> deep, including certain row operations. Users also needed a familiar and >>> simple R interface to manipulate that composite object despite the >>> complex >>> hierarchical relationship among rows. Correct me if I'm wrong, but I >>> think >>> there is still no core BioC support for representing tree-like or >>> bio-taxonomy-like hierarchy among rows in a SummarizedExperiment, or >>> equivalent; and consequently certain row operations may have to be >>> modified >>> more deeply than usual if we were to re-implement phyloseq "the right >>> way". >>> I'd love to hear thoughts on this. >>> >>> >> AFAIK you're right, and I don't know the solution, although I hope it can >> be built on SummarizedExperiment. Looking forward to talking more about >> this. >> >> >> Even though phyloseq is at the receiving end, I think the criticism is >>> fair, and I want current and future new BioC contributors to not re-make >>> my >>> mistakes circa 2011-12. I'm happy to help if I can. >>> >>> Cheers, and thanks for the interesting, collegial thread. >>> >>> Joey >>> >>> >> Thanks Joey, and I do want to say also that I think phyloseq is >> responsible >> for making Bioconductor a viable and already superior choice for >> statistical analysis of microbiome data! >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel