Hi all, Greatly enjoying this discussion. A few comments to current themes:
On Dockerfiles: Though I'm a dedicated Docker user (I publish Dockerfiles with my papers and have exclusively used Dockerized environments, both locally and remotely, for my computing environment for the past year and a half), I would push back on the idea that a Dockerfile is a complete recipe. I certainly believe writing a Dockerfile is a more practical solution than asking people to document dependencies manually -- if nothing else, it is easier to prove that it is the *actual* environment and not just what you *think* your environment is. However, a lot still depends on both how you write your Dockerfile, and many of the issues are really just kicked upstream to things like the Debian release process. That's a good solution, since releases tend to keep most libraries stable (while usually offering backported security updates for a finite window), but a magic bullet. And it's pretty easy & common to write your dockerfile to just pull the latest version of packages off CRAN or pip install or whatnot and so the recipe builds the bleeding edge software, not the versions you actually used. Who knows what longevity the binary docker images have, but it must be acknowledged that its the ugly heavy binary images, not the nice Dockerfile snapshot, that provides the definitive environment. General reproducibility: Very much agree with Titus that "what I need to run" feels like a more thorny problem for me than "what I'm running". I do wonder if that problem is more particular to my own work and doesn't afflict the majority of users in my field who might use a more 'vanilla' / off-the-shelf environment, or if we would find it to hold more generally if more researchers made "what I'm running" available in the first place. I'd love to have a more empirical picture of where reproducibility fails. Currently I conjecture 99% of research I encounter doesn't share scripts and probably isn't scripted to begin with, so we don't really reach 'computational environment' issues. More SWC-style training on using and sharing scripts is probably the biggest win. After that, I suspect one gets pretty far doing what language-specific packages do as far as documentation: i.e. capturing the descriptions at the level of an R DESCRIPTION file or python setup.py (or whatever it is). i.e. this is the hypothesis that the next most common failure point is due to changes in high-level packages rather than external C & fortran libraries (think BLAS), issues of compiler type & versions, differences in your kernel, or differences in your hardware or the cosmic rays passing through it (though we know all of those can matter!). The problems that Docker does vs does not address are perhaps a good study in which issues cause the most practical reproducible problems, and which are special cases. Damien, on logs & netcdf: Thanks for the clarifications, very helpful and we're very much on the same page. I agree with your picture that makefiles, a python script, a bash script etc are all operating in the same space here -- they all provide the dual role of human-readable & executable workflow of the analysis. (Though personally I would roll your #4 item on 'logs' into the section on just providing code / scripts. I think SWC does a disservice in both pedagogy and reproduciblity in teaching bash and bash scripting as something totally different than python and python scripting, but that's a discussion for another time). On Wed, Jan 13, 2016 at 3:15 PM Tim Head <[email protected]> wrote: > On Wed, Jan 13, 2016 at 11:46 PM Damien Irving < > [email protected]> wrote: > >> @Tim - I'm interested in your comment regarding Dockerfiles: "A human can >> read and re-produce it if we lost all the docker tools" >> >> Is the same true for any alternatives? For instance, if I put my >> environment up on anaconda.org is there a dockerfile equivalent that >> would allow it to be human read and re-produced even if conda disappeared? >> >> > Not sure. You can export your conda environment with: > > conda env export > environment.yml > > which produces a human readable file. I think, conda packages are "just" > tarballs. So you could read the environment.yml, obtain the right conda > packages, and then un-tar them. Something to try out. > > I think by default conda packages uploaded to binstar do not have the > recipe for making them attached/linked. A bit like pushing a docker image > to a registry without publishing the Dockerfile. > > T > _______________________________________________ > Discuss mailing list > [email protected] > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org -- http://carlboettiger.info
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
