Hi Jan,

Really interesting points. I’ve played with a number of these systems, and I 
had a few thoughts, but no solid answers. I think your concerns are completely 
valid.

Regarding the IPython/Jupyter notebooks, the JSON format is pretty simple and 
ubiquitous, so I don’t see it going anywhere anytime soon. I suspect that 
reading the notebooks will be fine ten years from now (as nbviewer.org does). 
Executing them, on the other hand, seems a bit less sure. It’s maybe worth 
noting that the cells do have execution indices, which you could (in principle) 
replay. Or, just never archive a notebook without resetting it and running 
start to finish all at once. I wonder if there are some VirtualEnv tricks that 
could help with this; in that case, you could potentially bundle the whole 
Python environment that you used. As they stand, though, the notebooks are 
entirely dependent on the Python (or other kernel) environment behind them, and 
that’s not really frozen into the notebook.

I guess I see Docker and Anaconda as very different beasts. Anaconda is a 
distribution that gives a (fairly) uniform environment across different 
operating systems, but it doesn’t really prevent version creep. Docker is all 
about building containers up from a series of snapshots (on a graph, like 
commits in a Git tree). However, it requires a tremendous amount of 
infrastructure to keep it running. I see Docker as being about deploying tools 
in the here and now, but I don’t really think it makes a good archival format. 
It’s also moving way too fast, with too little attention paid to deprecation.

Python VirtualEnvs might do the trick, but many of us have tools and libraries 
that aren’t going to be in the VirtualEnv. It’s a tough prospect.

Best,
Brendan

—

Brendan Smithyman
Postdoctoral Fellow

Western University, Earth Sciences
Biological & Geological Sciences, Rm. 1045
London, ON, Canada N6A 5B7
c. 778.990.5957

> On Jan 13, 2016, at 2:06 PM, Jan Kim <[email protected]> wrote:
> 
> Dear All,
> 
> I'm preparing a talk centred on reproducible computing, and as this is
> a topic relevant and valued by SWC I'd like to ask your opinion and
> comments about this.
> 
> One approach I take is checking to which extent my work from 10 - 20
> years ago is reproducible today, and (perhaps not surprisingly) I found
> that having used make, scripts and (relatively) well defined text
> formats turns out to be higly beneficial in this regard.
> 
> This has led me to wonder about some of the tools that currently seem
> to be popular, including on this list, but to me appear unnecessarily
> fat / overloaded and as such to have an uncertain perspective for long
> term reproducibility:
> 
>    * "notebook" systems, and iPython / jupyter in particular:
>      - Will the JSON format for saving notebooks be readable /
>        executable in the long term? 
>      - Are these even reproducible in a rigorous sense, considering
>        that results can vary depending on the order of executing cells?
> 
>    * Virtual machines and the recent lightweight "containerising"
>      systems (Docker, Conda): They're undoubtedly a blessing for
>      reproducibility but
>      - what are the long term perspectives of executing their images
>        / environments etc.?
>      - to which extent is their dependence on backing companies a
>        reason for concern?
> 
> I hope that comments on these are relevant / interesting to the SWC
> community, in addition to providing me with insights / inspiration,
> and that therefore posting this here is ok.
> 
> If you have comments on reproducible scientific computing in general,
> I'm interested as well --  please respond by mailing list or personal
> reply.
> 
> Best regards & thanks in advance, Jan
> -- 
> +- Jan T. Kim -------------------------------------------------------+
> |             email: [email protected]                                |
> |             WWW:   http://www.jtkim.dreamhosters.com/              |
> *-----=<  hierarchical systems are for files, not for humans  >=-----*
> 
> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Reply via email to