(Resending this as it could not be delivered.) Ricardo Wurmus <[email protected]> writes:
> Hi Roel, > >> With GNU Guix we are able to install programs to our machines with an amazing >> level of control over the dependency graph of the programs. We can now know >> what code will run when we invoke a program. We can now know what the impact >> of an upgrade will be. And we can now safely roll-back to previous states. >> >> What seems to be a common practice in research involving data analysis, is >> running multiple programs in a chain to transform data from raw to specific. >> This is often referred to as a "pipeline" or a "workflow". Because data sets >> can be quite large in comparison to the computing power of our laptops, the >> data analysis is performed on computing clusters instead of single machines. >> >> The usage of a pipeline/workflow is somewhat different from the package >> construction, because we want to run the sequence of commands on different >> data >> sets (as opposed to running it on the same source code). Plus, I would like >> to >> integrate it with existing computing clusters that have a job scheduling >> system >> in place. >> >> The reason I think this should be possible with Guix is that it has >> everything in place to do software deployment and run-time isolation >> (containers). From there it is a small step to executing programs in an >> automated way. >> >> So, I would like to propose a new Guix subcommand and an extension to >> the package management language to add workflow management features. > > I probably don’t understand your idea well enough, but from what I > understand it doesn’t really have much to do with packages (other than > using them) and store manipulation per se (produced artifacts are not > added to the store). Exactly what features of Guix do you want to build > on? > > My perspective on pipelines is that they should be developed like any > other software package, treating individual tools as you would treat > libraries. This means that a pipeline would have a configuration step > in which it checks for the paths of all tools it needs internally, and > then use the full paths rather than assume all tools to be in a > directory listed in the PATH variable. > > Distributing jobs to clusters would be the responsibility of the > pipeline, e.g. by using DRMMA, which supports several resource > management backends and has bindings for a wide range of programming > languages. > >> Would this be a feature you are interested in adding to GNU Guix? > > Even if it wasn’t part of Guix itself, you could develop it separately > and still add it as a Guix command, much like it is currently done for > “guix web” (which I think should eventually be part of Guix). > >> I'm currently working on a proof-of-concept implementation that has three >> record types/levels of abstraction: >> <workflow>: Describes which <process>es should be run, and concerns itself >> with >> the order of execution. >> >> <process>: Describes what packages are needed to run the programs involved, >> and its relationship to other processes. Processes take input >> and >> generate output much like the package construction process. >> >> <script>: Short and simple imperative instructions to perform a task. >> They are >> part of a <process>. Currently, my implementation generates a >> shell >> script that can be either Guile, Sh, Perl or Python. > > From that list it seems as if the only link to Guix is ensuring the > environment contains required programs. This can be done right now with > the help of manifests and profiles. > > I wonder if maybe we could add Guix as a package management backend to > existing workflow specification systems (instead of the curiously > popular and IMO barely adequate Conda, for example). > >> The subcommand I envision is: >> guix workflow >> >> With primarily: >> guix workflow --run=<name-of-workflow-definition> >> >> If you are interested in adding any form of workflow management to GNU Guix, >> I >> can elaborate on my proof-of-concept implementation, so we can work from >> there. >> (or throw everything out of the window and start from scratch ;-)) > > Could you show us an example workflow? > > ~~ Ricardo
