Hi,

How would you go about bringing the benefits of Nix to the users of a compute 
cluster?

Assume the following cluster: A login node, a file-system node, and a number of 
compute nodes. All nodes run on a recent CentOS and are fairly homogeneous. The 
fs node holds all user data and some common libraries. Its storage is nfs 
mounted on all other nodes.

Users ssh into the login node, write and compile some code, then they use the 
Sun Grid engine (sge) to submit compute jobs, and once these are finished they 
copy the results on their workstations and are happy.

There are subgroups of users with fairly exotic software requirements. These 
are not available in any package repositories, and the cluster admin doesn't 
have the time to install and maintain them. So, currently, most of these users 
just compile everything themselves in their home-directory, which is a huge 
waste of time, and storage space.

I would like to suggest Nix to the admin as a way to let these user-subgroups 
manage their own packages, but that in a well organized manner, that avoids 
redundant work, and storage. But, I'm not sure how exactly that should work. 

There are a few constraints:

  1. Unfortunately, NixOS/nixops is not an option. This will have to work with 
the currently installed cluster OS.
  2. Compilation should not put too much load on the login node. Ideally, build 
jobs would be referred to the compute nodes.
  3. Build jobs on the compute nodes should be managed by the sge.
  4. (Some) users should be allowed to initiate builds, and use their own 
overloads of packages, and extra packages.
  5. Some impurity is necessary. Be it for things that are hard to package 
(e.g. intel compiler), or for global state (mpi jobs).

My question to you: Do you think this is possible to achieve (within a 
reasonable time-frame), and how would you do it?

Here's what I have in mind so far (please feel free to take it apart if you 
think there is a better way):

Have a nix-store on the file-server, nfs mount that on all nodes (cached). The 
login node runs the nix-daemon. Builds are deferred to the grid-engine (how?) 
which are executed on the compute nodes, and store the results on the nfs 
mounted nix-store. Users would use `nix-env` on the login node to install 
software into their profile. This profile should be visible on all nodes, so 
that jobs can use those libraries and tools in the nix-profile. Things like 
myEnvFun should allow running jobs in different software environments 
simultaneously.

Best,

Andreas
_______________________________________________
nix-dev mailing list
[email protected]
http://lists.science.uu.nl/mailman/listinfo/nix-dev

Reply via email to