>From my understanding, what you are describing is what bioinfo guys
call a workflow:
1- fetch data here and there
2- clean and prepare data
3- compute stuff with these data
4- obtain an answer
and loop several times on several data sets.
Guix Workflow Language allows to implement the workflow, i.e., all the
steps and their link to deal with the data.
And because Guix, reproducibility in terms of softwares comes for almost free.
Moreover, if there is some channel mechanism, then there is a way to
share these workflows.
I think the tools are there, modulo UI and corner cases. :-)
>From my point of view, workflows are missing because of manpower
(lispy guy, etc.).
Last, a workflow is not necessary reproducible bit-to-bit since some
algorithms use randomness.
Hope that helps.
All the best,
On 9 February 2018 at 18:13, Ludovic Courtès <ludovic.cour...@inria.fr> wrote:
> Amirouche Boubekki <amirou...@hypermove.net> skribis:
>> tl;dr: Distribution of data and software seems similar.
>> Data is more and more important in software and reproducible
>> science. Data science ecosystem lakes resources sharing.
>> I think guix can help.
> I think some of us especially Guix-HPC folks are convinced about the
> usefulness of Guix as one of the tools in the reproducible science
> toolchain (that was one of the themes of my FOSDEM talk). :-)
> Now, whether Guix is the right tool to distribute data, I don’t know.
> Distributing large amounts of data is a job in itself, and the store
> isn’t designed for that. It could quickly become a bottleneck. That’s
> one of the reasons why the Guix Workflow Language (GWL) does not store
> scientific data in the store itself.
> I think data should probably be stored and distributed out-of-band using
> appropriate storage mechanisms.