Hello Ricardo & all! Ricardo Wurmus <rek...@elephly.net> skribis:
> I’m happy to announce that the group I’m working with has released a > preprint of a paper on reproducibility with the title: > > Reproducible genomics analysis pipelines with GNU Guix > https://www.biorxiv.org/content/early/2018/04/11/298653 > > We built a collection of bioinformatics pipelines and packaged them with > GNU Guix, and then looked at the degree to which the software achieves > bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism > (e.g. time stamps), discussed experimental reproducibility at runtime > (e.g. random number generators, kernel+glibc interface, etc) and > commented on the idea of using “containers” (or application bundles) > instead. Very impressive piece of work! I think it’s important to stress that reproducible builds is a crucial foundation for reproducible computational experiments, and this paper does a great job at this. Also nice that you show you can have these bit-reproducible pipelines formalized in Guix *and* produce a ready-to-use “container image.” Hopefully we can soon address the remaining sources of non-determinism shown in Table 3 (I think you already addressed some of them in the meantime, didn’t you?). The bit I’m less comfortable with is Autotools. I do understand how it helps capture configure-time dependencies, and how it generally helps people package and use the software; I think it’s one of the best tools for the job. However it’s also hard to learn and, whether it’s justified or not, it’s considered “scary.” Given the intended audience, I wonder how we could provide a simpler path to achieve the same goal. It could be a set of Autoconf macros leading to high-level ‘configure.ac’ files without any line of shell code, or it could be Guix interpreting a top-level .scm or JSON file, both of which would ideally be easier to write for bioinformaticians. What are your thoughts on this? Anyway, kudos on this, thank you! Ludo’.