Hi Jan, Really interesting points. I’ve played with a number of these systems, and I had a few thoughts, but no solid answers. I think your concerns are completely valid.
Regarding the IPython/Jupyter notebooks, the JSON format is pretty simple and ubiquitous, so I don’t see it going anywhere anytime soon. I suspect that reading the notebooks will be fine ten years from now (as nbviewer.org does). Executing them, on the other hand, seems a bit less sure. It’s maybe worth noting that the cells do have execution indices, which you could (in principle) replay. Or, just never archive a notebook without resetting it and running start to finish all at once. I wonder if there are some VirtualEnv tricks that could help with this; in that case, you could potentially bundle the whole Python environment that you used. As they stand, though, the notebooks are entirely dependent on the Python (or other kernel) environment behind them, and that’s not really frozen into the notebook. I guess I see Docker and Anaconda as very different beasts. Anaconda is a distribution that gives a (fairly) uniform environment across different operating systems, but it doesn’t really prevent version creep. Docker is all about building containers up from a series of snapshots (on a graph, like commits in a Git tree). However, it requires a tremendous amount of infrastructure to keep it running. I see Docker as being about deploying tools in the here and now, but I don’t really think it makes a good archival format. It’s also moving way too fast, with too little attention paid to deprecation. Python VirtualEnvs might do the trick, but many of us have tools and libraries that aren’t going to be in the VirtualEnv. It’s a tough prospect. Best, Brendan — Brendan Smithyman Postdoctoral Fellow Western University, Earth Sciences Biological & Geological Sciences, Rm. 1045 London, ON, Canada N6A 5B7 c. 778.990.5957 > On Jan 13, 2016, at 2:06 PM, Jan Kim <[email protected]> wrote: > > Dear All, > > I'm preparing a talk centred on reproducible computing, and as this is > a topic relevant and valued by SWC I'd like to ask your opinion and > comments about this. > > One approach I take is checking to which extent my work from 10 - 20 > years ago is reproducible today, and (perhaps not surprisingly) I found > that having used make, scripts and (relatively) well defined text > formats turns out to be higly beneficial in this regard. > > This has led me to wonder about some of the tools that currently seem > to be popular, including on this list, but to me appear unnecessarily > fat / overloaded and as such to have an uncertain perspective for long > term reproducibility: > > * "notebook" systems, and iPython / jupyter in particular: > - Will the JSON format for saving notebooks be readable / > executable in the long term? > - Are these even reproducible in a rigorous sense, considering > that results can vary depending on the order of executing cells? > > * Virtual machines and the recent lightweight "containerising" > systems (Docker, Conda): They're undoubtedly a blessing for > reproducibility but > - what are the long term perspectives of executing their images > / environments etc.? > - to which extent is their dependence on backing companies a > reason for concern? > > I hope that comments on these are relevant / interesting to the SWC > community, in addition to providing me with insights / inspiration, > and that therefore posting this here is ok. > > If you have comments on reproducible scientific computing in general, > I'm interested as well -- please respond by mailing list or personal > reply. > > Best regards & thanks in advance, Jan > -- > +- Jan T. Kim -------------------------------------------------------+ > | email: [email protected] | > | WWW: http://www.jtkim.dreamhosters.com/ | > *-----=< hierarchical systems are for files, not for humans >=-----* > > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
_______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
