Hi Martin to come back to the original trigger for this thread: it was not concerns for reproducibility, but the fact that a Bioc package in the current release stopped working because a CRAN package has changed in the meanwhile. What’s the most practical solution to this specific problem? Best wishes Wolfgang
On 23 Apr 2014, at 19:41, Martin Morgan <mtmor...@fhcrc.org> wrote: > On 04/22/2014 09:47 AM, Kasper Daniel Hansen wrote: >> I think we should have a CRAN snapshot (or a subset of CRAN used in Bioc) >> inside each Bioc release; I don't know how hard that is to manage from a >> technical point of view. > > I followed this thread with some interest. > > It would be surprisingly challenging to update even a 2.13 package -- the > build machines have moved on to other tasks, unconstrained by the unique > system dependencies needed for 2.13 builds. > > The idea of a 'forever' repository snapshot seems possible, but would the > snapshot be at the beginning of the release and hence miss the few but > important bug fixes introduced during the release, or at the end of the > release, which might be after the time required for the purposes of > replication? Either way it is certain that the peanut butter would land face > down for one's particular need. Also, the need for the user to satisfy system > dependencies becomes increasingly challenging, even with a binary repository. > I don't think a central 'Bioc' solution would really address the problem of > reproducibility. > > It is not that 'hard' for an individual group to create a snapshot of Bioc > and CRAN, using rsync > > http://www.bioconductor.org/about/mirrors/mirror-how-to/ > http://cran.r-project.org/mirror-howto.html > > and to use install.packages() or even biocLite to access these (see > ?setRepositories). This would again require that the system dependencies for > these packages are satisfied in some kind of frozen fashion. > > A more robust possibility is of course a virtual machine, such as the AMI (or > a customized version) we provide > > http://www.bioconductor.org/help/bioconductor-cloud-ami/#ami_ids > > although these have only a subset of packages installed by default. > > The CRAN thread referenced earlier included this post > > https://stat.ethz.ch/pipermail/r-devel/2014-March/068605.html > > which I think makes an important distinction between exact replication and > scientific reproducibility; it is the latter that must be the most > interesting, and the former that we somehow seem to stumble over. The thread > also mentions best practices -- version control > > http://bioconductor.org/developers/how-to/source-control/ > > disciplined approach to deprecation > > http://bioconductor.org/developers/how-to/deprecation/ > > package versioning > > http://bioconductor.org/developers/how-to/version-numbering/ > > and the Bioc-style approach to release that we as developers can act on to > enhance reproducibility. What other best practices can we more forcefully / > conveniently adopt within the project? > > Martin > >> >> Best, >> Kasper >> >> >> On Tue, Apr 22, 2014 at 6:06 PM, Julian Gehring >> <julian.gehr...@embl.de>wrote: >> >>> Hi, >>> >>> For most problems discussed here, it seems that having a fixed version of >>> package is sufficient rather than a specific version. If the idea of a >>> snapshot with each bioc release would work (which still means one version >>> per package), so would requiring that version within the package (one would >>> just need to agree which version this is). >>> >>> Best wishes >>> >>> Julian >>> >>> >>> what if two Bioc packages require different version of the ‘same’ CRAN >>>> package? >>>> AfaIu, the infrastructure is not designed to deal with multiple versions >>>> of a package. >>>> >>>> Nor would I as a user expect to have less-than-the-most recent versions >>>> of CRAN packages in my library just because some other package says so… >>>> >>>> Just to throw in another, and probably silly suggestion: the Bioconductor >>>> repository could keep ‘snapshots’ of CRAN packages compatible with each >>>> release, but they would have to be name-mangled in some way. The potential >>>> for confusion is enormous. >>>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel