Re: [Discuss] Reproducible computing -- some thoughts and questions

Juan Nunez-Iglesias Wed, 13 Jan 2016 15:13:58 -0800

Hi everyone,

Although I generally agree with the sentiment that the less tooling, the
better, I couldn't just see this thread go by without mentioning a library
I saw at EuroSciPy last year, Recipy:


https://github.com/recipy/recipy

It's a Python library, and to use it you just write "import recipy" at the
top of your scripts and it starts logging runtimes, input and output
arguments, git hashes, and so on. Input and output file hashes are on the
way. I think this kind of "zero effort" tool is extremely valuable.

I hope someone finds it useful! Plus I think this list has the kinds of
people who might contribute to the tool. =)

Juan.

On Thu, Jan 14, 2016 at 9:46 AM, Damien Irving <
[email protected]> wrote:

> @Tim - I'm interested in your comment regarding Dockerfiles: "A human can
> read and re-produce it if we lost all the docker tools"
>
> Is the same true for any alternatives? For instance, if I put my
> environment up on anaconda.org is there a dockerfile equivalent that
> would allow it to be human read and re-produced even if conda disappeared?
>
>
>
> On Thu, Jan 14, 2016 at 9:39 AM, Tim Head <[email protected]> wrote:
>
>>
>>
>> On Wed, Jan 13, 2016 at 9:52 PM C. Titus Brown <[email protected]>
>> wrote:
>>
>>>
>>> "What I need to run it" has been much, much more problematic over the 23
>>> years
>>> I've been doing this stuff (pardon my gout ;).  My code can unfortunately
>>> depend on all sorts of UNIX gobbledygook, down to specific (and recent)
>>> versions of gcc. Only with the advent of full virtualization (and now
>>> the cloud
>>> and Docker) have I found what I think is an acceptable solution.  The
>>> specific
>>> execution environment isn't all that important, be it cloud, Docker or a
>>> VM;
>>> it's the idea of being able to *computationally* specify the environment
>>> that
>>> is important.  And that is where Docker, in particular, excels.
>>>
>>>
>> Reproducing the environment in which something was run is at least 50% of
>> the difficulty of replicating something.
>>
>> I think an important point in the whole Docker story is that the
>> Dockerfile is far more valuable than the image it produces. The Dockerfile
>> is almost a standard for specifying how to obtain the environment. A human
>> can read and re-produce it if we lost all the docker tools as well as LXC
>> support in the kernel. It would be painful but it could be done. This is
>> why I believe people have to publish the Dockerfile, not just the image
>> that is produced by it.
>>
>>
>>> On the flip side, I've found that I don't really need Docker, or VMs, in
>>> my own work - it's just when I'm conveying it to others that it's useful.
>>>
>>
>> You need to work more on shared systems where sys admins can
>> replace/upgrade things while you are on holiday ;)
>>
>> I think the JSON format of the notebook is problematic (although
>>> it's understandable why they went that way).  The RMarkdown format is
>>> kinda
>>> nice and simple, and easily parseable.
>>>
>>>
>> nbconvert can create markdown with code blocks from a notebook, and there
>> are tools that will run it (or convert it back to a notebook). I think
>> these tools will gain more popularity for executable papers in the future.
>> They solve the "in what order should I run this" problem, are diff-able,
>> and you can read them in a text editor and follow along if we lose all the
>> jupyter tooling.
>>
>> T
>>
>> _______________________________________________
>> Discuss mailing list
>> [email protected]
>>
>> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
>>
>
>
> _______________________________________________
> Discuss mailing list
> [email protected]
>
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
>

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Re: [Discuss] Reproducible computing -- some thoughts and questions

Reply via email to