Hi EasyBuild folks, (with CC to Hashdist list)
I'm funded to work for two months on the Hashdist project (which only
exists on paper at the moment), and had a nice conversation with Kenneth
and Jens from EasyBuild today. The conclusion seems to be that Hashdist
and EasyBuild may complement one another nicely and are mostly orthogonal.
Minutes from our call:
https://github.com/hpcugent/easybuild/wiki/Notes-on-EasyBuild-HashDist-conf-call-%2820121126%29
The aim of Hashdist is to accelerate the development of existing and
future (scientific) software distribution systems, by providing some
core tools that can be shared between them.
Currently all software distribution systems lacks the features I
need/want. Instead of wasting my time in the attempt to write yet
another distribution framework and get 10% there, I want to develop just
those features that I think are missing, and then hope that existing
systems such as EasyBuild picks it up and uses it.
Thus Hashdist is not meant to be used directly (except perhaps for a few
power-users) but rather as a component in other distribution systems.
Hashdist will be a set of loosely coupled tools. The below is my
personal wishlist which may be adjusted as the project proceeds:
a) A source store mechanism for downloading and hashing source code (the
hashing bit being the important part).
You already have a lot of this in EasyBuild but some others don't,
perhaps you can ignore this (or only use it to get the hashes to give to b).
b) A "prefix database system" based on hashing; e.g.,
~/.hashdist/artifacts/numpy/1.7/a4324sdfq32r
(The exact path-name pattern will probably be configurable, that's one
of the things I want to engage you in discussing.)
This is what you already have in EasyBuild except that a cryptographic
hash is included in the path-name, so that if you make a minor change
such as a minor-version gcc upgrade, or change CFLAGS, or apply a minor
patch to your git tree and want to quickly try it out, this can change
the hash and cause a new parallel build/installation.
The "try it out" bit is important. I want jumping around between
slightly different software stacks to be as quick and easy as using git,
and this relies on the hashes to be quite reliable.
c) A tool for capturing the system software and hashing it and making a
prefix that symlinks to it; i.e.,
~/.hashdist/artifacts/gcc/4.6.3/34qw3da32e4q2 # symlinks to /usr/...
The point is simply that if the system software is upgraded we want to
track it somehow in the hashes of the dependencies. (Details on this to
be hashed out but I have some main ideas ready.)
d) A light-weight (optional!) jail tool to make sure that all
dependencies are explicitly stated when creating packages, so that the
following command would fail if anything is pulled in from /usr/lib
which wasn't first accessed through a symlink created in c) above:
LD_PRELOAD=hdistjail.so gcc ....
e) Garbage collection to remove prefixes that are no longer used
f) A tool to build "profiles", which are prefixes that mostly symlinks
to other software. (However there are some non-trivial cases, such as I
want to allow Python and Python packages to live in different prefixes
but still use them without relying on setting PYTHONPATH; this can be
handled by copying the python executable instead of symlinking it.)
g) And finally, a tool like "modules" that knows about the above allows
inserting prefixes/profiles into the environment.
For desktop users in particular I think it's very important to be able
to simply call /some/path/to/python, e.g., without having to have
PYTHONPATH, LD_LIBRARY_PATH and so on set up correctly (which is doable,
but takes some extra effort). This may not affect EasyBuild that much,
but can help explain some of the design decisions in Hashdist as I go along.
Dag Sverre