Hi Fotis,
On 4/5/15, 2:21 AM, "Fotis Georgatos" <[email protected]> wrote:
>
>Hi Todd, Jack,
>
>I¹ll do top-posting for change :), and keep the relevant bits visible
>below.
>
>
>The unfortunate issue with letting R handling its dependencies downloading
>on its own is, that it turns difficult to ensure consistent environment
>across sites:
>* build reproducibility is shot in the foot, since you will rely on
>external downloads
>* algorithmic reproducibility is shot in the head, since versions can
>creep forward
>
>AFAIK, cases like with R Bioconductor are hopeless, because these fellows
>are moving
>in a totally different angle, than people that wish to rely on artefacts
>(repositories)
>and the view that most EasyBuilders have, which is that of bit-accurate
>rebuilds.
Yes. There are actually a surprising number of projects that do this.
Spack has two ways to address that. One is create a mirror:
http://scalability-llnl.github.io/spack/mirrors.html
It's relatively primitive right now and I think what I'd rather have is
something like a mirror, where every archive or repo fetched is archived
in the local mirror. That would be a nice setting for a site concerned
with reproducibility.
The other thing that Spack does that doesn't quite satisfy the
reproducibility requirement is that you can do this:
spack install [email protected]
And if it doesn't happen to have a checksum for 1.7, it'll ask you whether
you want to try to download the unknown version insecurely. So, you can
install things that are not currently known to your package files. That
only works for packages where the URL is easy to infer. That's *most*
packages, but there are always annoying cases where it's not easy, or the
external URLs for versions are not consistent. Mirrors solve that.
>The concept of using symlinks to activate/deactive software components,
>though,
>is a very valid one and not all that remote from what was discussed
>during a spring 2013 hackathon in Cyprus, about trying out ³Stowed
>toolchains/bundles² :
>* ie. let many module versions exist and create mix¹n¹match symlinked
>metamodules
>
>If you read about ³Stow² tool on GNU¹s pages you will probably realise
>this option:
>https://www.gnu.org/software/stow/
I was not familiar with stow, but that's the general idea. I guess I
implemented that in Python. I can ignore files on activate and and
create/delete directories and symlinks on deactivate. The one thing I see
there that is necessary for Python that is NOT in Stow is the ability to
have custom per-package logic to merge files. I do this for
easy_install.pth because a lot of python packages a) want easy_install to
be there and b) annoyingly install their own easy_install at an arbitrary
version if they don't find it. Spack handles merging the code that
includes .egg files in easy_install.pth, and that is all done by some
overrides in the Python package so that extensions do not need to
implement it:
https://github.com/scalability-llnl/spack/blob/master/var/spack/packages/p
ython/package.py
If you don't do that, some python packages will not appear to Python as
installed -- you can't just symlink them in. easy_install does something
different when you install into the python prefix vs. when you install a
package in its own prefix. I wouldn't be surprised if other languages
like R have similar issues.
>fi. this technique would allow to trim down HPCBIOS_Bioinfo inflated 70+
>modules bundle,
>down to a dozen AND permit extensions management in a way similar to what
>you describe.
>This may prove to be an essential method to let $PATH, $LD_LIBRARY_PATH
>etc to breath,
>look reasonable again & give users freedom to choose what they wish
>included and not.
>
>After all, it IS cheaper to manage a symlink farm than a bunch of
>modulefiles!
Yes. Spack basically provides a "mini-package manager" for each python
install directory, in that you can manage what is activated/deactivated
there. It seems like the only thing you can really do for the python
package model, aside from requiring users to stuff their PYTHONPATH (and
other paths) full of directories. They can still do that in spack, e.g.
If the version they want doesn't match what's in the python install. But
they don't have to and admins can provide sensible defaults by activating
them.
>To summarise: using symlinks may prove to be a reasonable approach to deal
>with complicated extensions or modules sets, although it is not clear yet
>what may be the side-effects of applying this logic universally. I¹ve
>used R
>as an example, but the arguments may well apply for Python, Perl, Ruby,
>etc.
>(claiming no expert in any of these categories)
>
>I¹d be interested to hear more about Spack¹s experience/insight on the
>matter.
I think symlink farms are good baseline functionality for handling this
case, but our Python experience shows that it's not really enough -- some
files may need to be merged, and you might need to fake some of the
functionality of a language-specific installer like easy_install. I'm
sure I'll learn something new when I get around to handling R, perl, etc.
packages :).
-Todd
>
>cheers,
>Fotis
>
>On Apr 2, 2015, at 7:01 AM, Todd Gamblin <[email protected]> wrote:
>> Activation *mostly* just symlinks the external module (and any
>>dependency
>> extensions) into the python install, but the python package itself can
>> extend the way activation is done. That is used to provide some custom
>> logic for merging easy_install.pth files and other things that would
>> otherwise conflict within the same prefix. So, Spack python installs
>>are
>> a bit like virtualenvs, in that you can swap versions of extensions in
>>and
>> out by activating/deactivating them.
>[.. ..]
>> With this kind of design, I *think* it would just require overriding
>> activate/deactivate in the R package, unless R has something MUCH more
>> complicated than Python.
>
>
>--
>echo "sysadmin know better bash than english" | sed s/min/mins/ \
> | sed 's/better bash/bash better/' # signal detected in a CERN forum
>
>
>
>
>
>
>