Hi Amirouche, Amirouche Boubekki <[email protected]> skribis:
> On 2018-02-09 18:13, [email protected] wrote: >> Hi! >> >> Amirouche Boubekki <[email protected]> skribis: >> >>> tl;dr: Distribution of data and software seems similar. >>> Data is more and more important in software and reproducible >>> science. Data science ecosystem lakes resources sharing. >>> I think guix can help. >> >> I think some of us especially Guix-HPC folks are convinced about the >> usefulness of Guix as one of the tools in the reproducible science >> toolchain (that was one of the themes of my FOSDEM talk). :-) >> >> Now, whether Guix is the right tool to distribute data, I don’t know. >> Distributing large amounts of data is a job in itself, and the store >> isn’t designed for that. It could quickly become a bottleneck. > > What does it mean technically that the store “isn't designed for that”? There are several potential issues. One is GC: how convenient is it to have big datasets subject to GC? Another one is I/O bottleneck: when adding a file to the store, you currently do an ‘add-to-store’ RPC to the daemon, pass it the file name, which the daemon then reads entirely to compute its content hash; could be an issue with big datasets. HTH, Ludo’.
