Dear EasyBuilders,
Recently, I have been thinking a lot about the differences between
EasyBuild and Spark, and I've come to the conclusion that there are
several Slack features that should be added to EasyBuild as well...
As I've mentioned before, I strongly believe both tools have their
merits, due to the design choices that were made early on during their
development.
The perfect software installation tool for HPC environments is probably
some sort of middle ground between both approaches, and I feel this is
the first step towards that goal. We'll never reach perfection, but that
shouldn't stop us from trying to achieve it...
Last night I have implemented a new module naming scheme, named
HashedEasyBuildMNS, which is basically the familiar (default) EasyBuild
module naming scheme with some sprinkles on top, being a SHA1 hash that
is added to the module name and the name of the installation directory,
which represents the context in which the installation is being performed.
This hashing mechanism is one of the key features of Spock. It ensures
that you'll get an entirely new installation as soon as something that
may affect the installation is different from before, including the
implementation of the installation procedure (both easyblocks and the
EasyBuild framework itself), the environment, the system configuration, etc.
Not only does this new module naming scheme bring the concept of hashed
installations to EasyBuild, it also takes things a step further than
what Snack does, by (optionally) taking into account the time at which
the EasyBuild session was started.
As we have observed in the past, installing some software packages at a
later point in time may result in a different installation, for a
variety of reasons (auto-downloading of stuff during the installation,
timestamps in binary files, phase of the moon, etc.).
You can opt in to taking into account a timestamp in the SHA1 hash by
setting the $EB_HASH_TIMESTAMP environment variable.
The implementation of this new module naming scheme is available in
attachment.
Example usage (assuming that hashedeasybuildmns.py is stored in the
current working directory):
eb example.eb --robot --include-module-naming-schemes
hashedeasybuildmns.py --module-naming-scheme HashedEasyBuildMNS
--disable-fixed-installdir-naming-scheme
Or, equivalently using environment variables to configure EasyBuild:
export
EASYBUILD_INCLUDE_MODULE_NAMING_SCHEMES=/path/to/hashedeasybuildmns.py
export EASYBUILD_MODULE_NAMING_SCHEME=HashedEasyBuildMNS
export EASYBUILD_DISABLE_FIXED_INSTALLDIR_NAMING_SCHEME=1
eb example.eb --robot
Please try using HashedEasyBuildMNS and give feedback, since we are
planning to make this the new default module naming scheme in a future
(major) release of EasyBuild.
regards,
Kenenth
##
# Copyright 2020 Ghent University
#
# This file is part of EasyBuild,
# originally created by the HPC team of Ghent University (http://ugent.be/hpc/en),
# with support of Ghent University (http://ugent.be/hpc),
# the Flemish Supercomputer Centre (VSC) (https://www.vscentrum.be),
# Flemish Research Foundation (FWO) (http://www.fwo.be/en)
# and the Department of Economy, Science and Innovation (EWI) (http://www.ewi-vlaanderen.be/en).
#
# https://github.com/easybuilders/easybuild
#
# EasyBuild is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation v2.
#
# EasyBuild is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with EasyBuild. If not, see <http://www.gnu.org/licenses/>.
##
"""
Implementation of EasyBuild module naming scheme with additional SHA1 hash,
to ensure all aspects that could affect the installation are taken into account.
:author: Kenneth Hoste (Ghent University)
"""
import inspect
import os
import sys
from datetime import datetime
from hashlib import sha1
from importlib import import_module
import easybuild.framework
from easybuild.framework.easyblock import EasyBlock
from easybuild.framework.easyconfig.easyconfig import get_easyblock_class
from easybuild.tools.build_log import EasyBuildError
from easybuild.tools.config import build_option
from easybuild.tools.filetools import read_file
from easybuild.tools.module_naming_scheme.mns import ModuleNamingScheme
from easybuild.tools.module_naming_scheme.utilities import det_full_ec_version
from easybuild.tools.systemtools import get_system_info
from easybuild.tools.version import VERSION
# grab current time; we should do this only once,
# otherwise dependencies will never resolve because the hash keeps changing if $EB_HASH_TIMESTAMP is set
current_time = datetime.now()
system_info = get_system_info()
# computing hashes over and over again is expensive,
# so we cache them to avoid recomputing them
_cache = {}
class HashedEasyBuildMNS(ModuleNamingScheme):
"""Class implementing the default EasyBuild module naming scheme."""
REQUIRED_KEYS = ['name', 'version', 'versionsuffix', 'toolchain', 'easyblock']
def det_full_module_name(self, ec):
"""
Determine full module name from given easyconfig, according to the EasyBuild module naming scheme.
Add SHA1 hash taking into account things that may affect the installation, like:
* EasyBuild version
* system info
* contents of easyconfig file being installed
* contents of easyblocks involved in the installation
* full module name (incl. hash) of all dependencies, including toolchain
* all EasyBuild framework code
* the full environment
* timestamp that EasyBuild session was started (only if $EB_HASH_TIMESTAMP is set)
:param ec: dict-like object with easyconfig parameter values (e.g. 'name', 'version', etc.)
:return: string with full module name <name>/<installversion>, e.g.: 'gzip/1.5-goolf-1.4.10'
"""
# make sure that fixed-installdir-naming-scheme configuration option is *disabled!*
if build_option('fixed_installdir_naming_scheme'):
error_msg = "Make sure that the fixed_installdir_naming_scheme is disabled!\n"
error_msg += "Use --disable-fixed-installdir-naming-scheme, "
error_msg += "or set $EASYBUILD_DISABLE_FIXED_INSTALLDIR_NAMING_SCHEME=1 ."
raise EasyBuildError(error_msg)
full_mod_name = os.path.join(ec['name'], det_full_ec_version(ec))
if full_mod_name in _cache:
hashed_full_mod_name = _cache[full_mod_name]
else:
# EasyBuild version
context = str(VERSION)
# system info
for key in sorted(system_info):
context += str(system_info[key])
# contents of easyconfig file being installed
context += read_file(ec.path)
# contents of easyblocks involved in the installation
eb_class = get_easyblock_class(ec['easyblock'], name=ec.name)
for klass in inspect.getmro(eb_class):
if klass == EasyBlock:
break
module = import_module(klass.__module__)
context += read_file(module.__file__)
# full module name (incl. hash) of all dependencies, including toolchain
dep_mod_names = sorted([dep['full_mod_name'] for dep in ec.dependencies()])
context += ','.join(dep_mod_names)
# take into account that system toolchain has no module name (None value)
context += ec.toolchain.mod_full_name or ''
# *all* EasyBuild framework code
easybuild_topdir = os.path.dirname(os.path.dirname(os.path.abspath(easybuild.framework.__file__)))
for dirpath, _, filenames in sorted(os.walk(easybuild_topdir)):
for fn in sorted(filenames):
if fn.endswith('.py'):
path = os.path.join(dirpath, fn)
context += read_file(path)
# the full environment
for key in sorted(os.environ):
# EasyBuild sets $TMPDIR & co to a unique value on every run,
# so we have to exclude this from the hash to avoid that it changes on every run...
# we're taking the bold assumption here that different $TMPDIR paths don't affect the installation
if key in ['TEMP', 'TEMPDIR', 'TMP', 'TMPDIR']:
continue
context += os.environ[key]
# timestamp that EasyBuild session was started (only if $EB_HASH_TIMESTAMP is set)
if os.environ.get('EB_HASH_TIMESTAMP'):
context += current_time.strftime("%Y%m%d%H%M%S")
# hashlib.sha1 expects a bytestring in Python 3, so use .encode first in that case
if sys.version_info >= (3,):
context = context.encode()
# compute SHA1 hash, compose hashed module name, and cache it
sha1_hash = sha1(context).hexdigest()
hashed_full_mod_name = full_mod_name + '-' + sha1_hash
_cache[full_mod_name] = hashed_full_mod_name
return hashed_full_mod_name
def det_install_subdir(self, ec):
"""
Determine name of software installation subdirectory of install path.
"""
# use hashed full module name as name for install subdir
return self.det_full_module_name(ec)