Hi EasyBuild folks, (with CC to Hashdist list)

I'm funded to work for two months on the Hashdist project (which only exists on paper at the moment), and had a nice conversation with Kenneth and Jens from EasyBuild today. The conclusion seems to be that Hashdist and EasyBuild may complement one another nicely and are mostly orthogonal.

Minutes from our call: https://github.com/hpcugent/easybuild/wiki/Notes-on-EasyBuild-HashDist-conf-call-%2820121126%29

The aim of Hashdist is to accelerate the development of existing and future (scientific) software distribution systems, by providing some core tools that can be shared between them.

Currently all software distribution systems lacks the features I need/want. Instead of wasting my time in the attempt to write yet another distribution framework and get 10% there, I want to develop just those features that I think are missing, and then hope that existing systems such as EasyBuild picks it up and uses it.

Thus Hashdist is not meant to be used directly (except perhaps for a few power-users) but rather as a component in other distribution systems.

Hashdist will be a set of loosely coupled tools. The below is my personal wishlist which may be adjusted as the project proceeds:

a) A source store mechanism for downloading and hashing source code (the hashing bit being the important part).

You already have a lot of this in EasyBuild but some others don't, perhaps you can ignore this (or only use it to get the hashes to give to b).

b) A "prefix database system" based on hashing; e.g.,

~/.hashdist/artifacts/numpy/1.7/a4324sdfq32r

(The exact path-name pattern will probably be configurable, that's one of the things I want to engage you in discussing.)

This is what you already have in EasyBuild except that a cryptographic hash is included in the path-name, so that if you make a minor change such as a minor-version gcc upgrade, or change CFLAGS, or apply a minor patch to your git tree and want to quickly try it out, this can change the hash and cause a new parallel build/installation.

The "try it out" bit is important. I want jumping around between slightly different software stacks to be as quick and easy as using git, and this relies on the hashes to be quite reliable.

c) A tool for capturing the system software and hashing it and making a prefix that symlinks to it; i.e.,

~/.hashdist/artifacts/gcc/4.6.3/34qw3da32e4q2 # symlinks to /usr/...

The point is simply that if the system software is upgraded we want to track it somehow in the hashes of the dependencies. (Details on this to be hashed out but I have some main ideas ready.)

d) A light-weight (optional!) jail tool to make sure that all dependencies are explicitly stated when creating packages, so that the following command would fail if anything is pulled in from /usr/lib which wasn't first accessed through a symlink created in c) above:

LD_PRELOAD=hdistjail.so gcc ....

e) Garbage collection to remove prefixes that are no longer used

f) A tool to build "profiles", which are prefixes that mostly symlinks to other software. (However there are some non-trivial cases, such as I want to allow Python and Python packages to live in different prefixes but still use them without relying on setting PYTHONPATH; this can be handled by copying the python executable instead of symlinking it.)

g) And finally, a tool like "modules" that knows about the above allows inserting prefixes/profiles into the environment.

For desktop users in particular I think it's very important to be able to simply call /some/path/to/python, e.g., without having to have PYTHONPATH, LD_LIBRARY_PATH and so on set up correctly (which is doable, but takes some extra effort). This may not affect EasyBuild that much, but can help explain some of the design decisions in Hashdist as I go along.

Dag Sverre

Reply via email to