Re: Guix on clusters and in HPC

Roel Janssen Tue, 18 Oct 2016 09:55:51 -0700

Ludovic Courtès writes:

> Hello,
>
> I’m trying to gather a “wish list” of things to be done to facilitate
> the use of Guix on clusters and for high-performance computing (HPC).
>
> Ricardo and I wrote about the advantages, shortcomings, and perspectives
> before:
>
>   http://elephly.net/posts/2015-04-17-gnu-guix.html
>   https://hal.inria.fr/hal-01161771/en
>
> I know that Pjotr, Roel, Ben, Eric and maybe others also have experience
> and ideas on what should be done (and maybe even code? :-)).
>
> So I’ve come up with an initial list of work items going from the
> immediate needs to crazy ideas (batch scheduler integration!) that
> hopefully make sense to cluster/HPC people.  I’d be happy to get
> feedback, suggestions, etc. from whoever is interested!
>
> (The reason I’m asking is that I’m considering submitting a proposal at
> Inria to work on some of these things.)
>
> TIA!  :-)


Here are some aspects I think we need:

* Network-aware guix-daemon

  From a user's point of view it would be cool to have a network-aware
  guix-daemon.  In our cluster, we have a shared storage, on which we have
  the store, but manipulating the store through guix-daemon is now limited
  to a single node (and a single request per profile).  Having `guix' talk
  with `guix-daemon' over a network allows users to install stuff from
  any node, instead of a specific node.

* Profile management

  The abstraction of profiles is an awesome feature of FPM, but the user
  interface is missing.  We could do better here.

  Switch the default profile
  (and prepend values of environment variables to the current values):
  $ guix profile --switch=/path/to/shared/profile

  Reset to default profile (and environment variable values without the
  profile we just unset):
  $ guix profile --reset

  Create an isolated environment based on a profile:
  $ guix environment --profile=/path/to/profile --pure --ad-hoc

* Workflow management/execution

  Add automatic program execution with its own vocabulary.  I think
  "workflow management" boils down to execution of a G-exp, but the
  results do not necessarily need to be stored in the store (because the
  data it works on is probably managed by an external data management
  system).  A powerful feature of GNU Guix is its domain-specific
  language for describing software packages.  We could add
  domain-specific parts for workflow management (a `workflow' data type
  and a `task' or `process' data type gets us there more or less).

  With workflow management we are only interested in the "build
  function", not the "source code" or the "build output".

  You are probably aware that I worked on this for some time, so I could
  share the data types I have and the execution engine parts I have.

  The HPC-specific part of this is the compatibility with existing job
  scheduling systems and data management systems.

* Document on why we need super user privileges on the Guix daemon

  Probably an infamous point by now.  By design, the Linux kernel keeps
  control over all processes.  With GNU Guix, we need some control over
  the environment in which a process runs (disable network access,
  change the user that executes a process), and the environment in which
  the output lives (chown, chmod, to allow multiple users to use the
  build output).  Instead of hitting the wall of "we are not going to
  run this thing with root privileges", we could present our sysadmins
  with a document for the reasons, the design decisions and the actual
  code involved in super user privilege stuff.

  This is something I am working on as well, but help is always welcome
  :-).


Kind regards,
Roel Janssen

Re: Guix on clusters and in HPC

Reply via email to