Re: Workflow management with GNU Guix

Ricardo Wurmus Mon, 16 May 2016 05:22:41 -0700

(Resending this as it could not be delivered.)

Ricardo Wurmus <[email protected]> writes:


> Hi Roel,
>
>> With GNU Guix we are able to install programs to our machines with an amazing
>> level of control over the dependency graph of the programs.  We can now know
>> what code will run when we invoke a program.  We can now know what the impact
>> of an upgrade will be.  And we can now safely roll-back to previous states.
>>
>> What seems to be a common practice in research involving data analysis, is
>> running multiple programs in a chain to transform data from raw to specific. 
>> This is often referred to as a "pipeline" or a "workflow".  Because data sets
>> can be quite large in comparison to the computing power of our laptops, the
>> data analysis is performed on computing clusters instead of single machines.
>>
>> The usage of a pipeline/workflow is somewhat different from the package
>> construction, because we want to run the sequence of commands on different 
>> data
>> sets (as opposed to running it on the same source code).  Plus, I would like 
>> to
>> integrate it with existing computing clusters that have a job scheduling 
>> system
>> in place.  
>>
>> The reason I think this should be possible with Guix is that it has
>> everything in place to do software deployment and run-time isolation
>> (containers).  From there it is a small step to executing programs in an
>> automated way.
>>
>> So, I would like to propose a new Guix subcommand and an extension to
>> the package management language to add workflow management features.
>
> I probably don’t understand your idea well enough, but from what I
> understand it doesn’t really have much to do with packages (other than
> using them) and store manipulation per se (produced artifacts are not
> added to the store).  Exactly what features of Guix do you want to build
> on?
>
> My perspective on pipelines is that they should be developed like any
> other software package, treating individual tools as you would treat
> libraries.  This means that a pipeline would have a configuration step
> in which it checks for the paths of all tools it needs internally, and
> then use the full paths rather than assume all tools to be in a
> directory listed in the PATH variable.
>
> Distributing jobs to clusters would be the responsibility of the
> pipeline, e.g. by using DRMMA, which supports several resource
> management backends and has bindings for a wide range of programming
> languages.
>
>> Would this be a feature you are interested in adding to GNU Guix?
>
> Even if it wasn’t part of Guix itself, you could develop it separately
> and still add it as a Guix command, much like it is currently done for
> “guix web” (which I think should eventually be part of Guix).
>
>> I'm currently working on a proof-of-concept implementation that has three
>> record types/levels of abstraction:
>> <workflow>:  Describes which <process>es should be run, and concerns itself 
>> with
>>              the order of execution.
>>
>> <process>:   Describes what packages are needed to run the programs involved,
>>              and its relationship to other processes.  Processes take input 
>> and
>>              generate output much like the package construction process.
>>
>> <script>:    Short and simple imperative instructions to perform a task. 
>> They are
>>              part of a <process>.  Currently, my implementation generates a 
>> shell
>>              script that can be either Guile, Sh, Perl or Python.
>
> From that list it seems as if the only link to Guix is ensuring the
> environment contains required programs.  This can be done right now with
> the help of manifests and profiles.
>
> I wonder if maybe we could add Guix as a package management backend to
> existing workflow specification systems (instead of the curiously
> popular and IMO barely adequate Conda, for example).
>
>> The subcommand I envision is:
>>   guix workflow
>>
>> With primarily:
>>   guix workflow --run=<name-of-workflow-definition>
>>
>> If you are interested in adding any form of workflow management to GNU Guix, 
>> I
>> can elaborate on my proof-of-concept implementation, so we can work from 
>> there.
>> (or throw everything out of the window and start from scratch ;-))
>
> Could you show us an example workflow?
>
> ~~ Ricardo

Re: Workflow management with GNU Guix

Reply via email to