Dear EasyBuilders,

Recently, I have been thinking a lot about the differences between EasyBuild and Spark, and I've come to the conclusion that there are several Slack features that should be added to EasyBuild as well...

As I've mentioned before, I strongly believe both tools have their merits, due to the design choices that were made early on during their development. The perfect software installation tool for HPC environments is probably some sort of middle ground between both approaches, and I feel this is the first step towards that goal. We'll never reach perfection, but that shouldn't stop us from trying to achieve it...


Last night I have implemented a new module naming scheme, named HashedEasyBuildMNS, which is basically the familiar (default) EasyBuild module naming scheme with some sprinkles on top, being a SHA1 hash that is added to the module name and the name of the installation directory, which represents the context in which the installation is being performed.

This hashing mechanism is one of the key features of Spock. It ensures that you'll get an entirely new installation as soon as something that may affect the installation is different from before, including the implementation of the installation procedure (both easyblocks and the EasyBuild framework itself), the environment, the system configuration, etc.

Not only does this new module naming scheme bring the concept of hashed installations to EasyBuild, it also takes things a step further than what Snack does, by (optionally) taking into account the time at which the EasyBuild session was started. As we have observed in the past, installing some software packages at a later point in time may result in a different installation, for a variety of reasons (auto-downloading of stuff during the installation, timestamps in binary files, phase of the moon, etc.).

You can opt in to taking into account a timestamp in the SHA1 hash by setting the $EB_HASH_TIMESTAMP environment variable.


The implementation of this new module naming scheme is available in attachment.

Example usage (assuming that hashedeasybuildmns.py is stored in the current working directory):

    eb example.eb --robot --include-module-naming-schemes hashedeasybuildmns.py --module-naming-scheme HashedEasyBuildMNS --disable-fixed-installdir-naming-scheme

Or, equivalently using environment variables to configure EasyBuild:

    export EASYBUILD_INCLUDE_MODULE_NAMING_SCHEMES=/path/to/hashedeasybuildmns.py
    export EASYBUILD_MODULE_NAMING_SCHEME=HashedEasyBuildMNS
    export EASYBUILD_DISABLE_FIXED_INSTALLDIR_NAMING_SCHEME=1
    eb example.eb --robot


Please try using HashedEasyBuildMNS and give feedback, since we are planning to make this the new default module naming scheme in a future (major) release of EasyBuild.


regards,

Kenenth
##
# Copyright 2020 Ghent University
#
# This file is part of EasyBuild,
# originally created by the HPC team of Ghent University (http://ugent.be/hpc/en),
# with support of Ghent University (http://ugent.be/hpc),
# the Flemish Supercomputer Centre (VSC) (https://www.vscentrum.be),
# Flemish Research Foundation (FWO) (http://www.fwo.be/en)
# and the Department of Economy, Science and Innovation (EWI) (http://www.ewi-vlaanderen.be/en).
#
# https://github.com/easybuilders/easybuild
#
# EasyBuild is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation v2.
#
# EasyBuild is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with EasyBuild.  If not, see <http://www.gnu.org/licenses/>.
##
"""
Implementation of EasyBuild module naming scheme with additional SHA1 hash,
to ensure all aspects that could affect the installation are taken into account.

:author: Kenneth Hoste (Ghent University)
"""
import inspect
import os
import sys
from datetime import datetime
from hashlib import sha1
from importlib import import_module

import easybuild.framework
from easybuild.framework.easyblock import EasyBlock
from easybuild.framework.easyconfig.easyconfig import get_easyblock_class
from easybuild.tools.build_log import EasyBuildError
from easybuild.tools.config import build_option
from easybuild.tools.filetools import read_file
from easybuild.tools.module_naming_scheme.mns import ModuleNamingScheme
from easybuild.tools.module_naming_scheme.utilities import det_full_ec_version
from easybuild.tools.systemtools import get_system_info
from easybuild.tools.version import VERSION

# grab current time; we should do this only once,
# otherwise dependencies will never resolve because the hash keeps changing if $EB_HASH_TIMESTAMP is set
current_time = datetime.now()
system_info = get_system_info()

# computing hashes over and over again is expensive,
# so we cache them to avoid recomputing them
_cache = {}


class HashedEasyBuildMNS(ModuleNamingScheme):
    """Class implementing the default EasyBuild module naming scheme."""

    REQUIRED_KEYS = ['name', 'version', 'versionsuffix', 'toolchain', 'easyblock']

    def det_full_module_name(self, ec):
        """
        Determine full module name from given easyconfig, according to the EasyBuild module naming scheme.

        Add SHA1 hash taking into account things that may affect the installation, like:
        * EasyBuild version
        * system info
        * contents of easyconfig file being installed
        * contents of easyblocks involved in the installation
        * full module name (incl. hash) of all dependencies, including toolchain
        * all EasyBuild framework code
        * the full environment
        * timestamp that EasyBuild session was started (only if $EB_HASH_TIMESTAMP is set)

        :param ec: dict-like object with easyconfig parameter values (e.g. 'name', 'version', etc.)
        :return: string with full module name <name>/<installversion>, e.g.: 'gzip/1.5-goolf-1.4.10'
        """
        
        # make sure that fixed-installdir-naming-scheme configuration option is *disabled!*
        if build_option('fixed_installdir_naming_scheme'):
            error_msg = "Make sure that the fixed_installdir_naming_scheme is disabled!\n"
            error_msg += "Use --disable-fixed-installdir-naming-scheme, "
            error_msg += "or set $EASYBUILD_DISABLE_FIXED_INSTALLDIR_NAMING_SCHEME=1 ."
            raise EasyBuildError(error_msg)

        full_mod_name = os.path.join(ec['name'], det_full_ec_version(ec))

        if full_mod_name in _cache:
            hashed_full_mod_name = _cache[full_mod_name]

        else:
            # EasyBuild version
            context = str(VERSION)

            # system info
            for key in sorted(system_info):
                context += str(system_info[key])

            # contents of easyconfig file being installed
            context += read_file(ec.path)

            # contents of easyblocks involved in the installation
            eb_class = get_easyblock_class(ec['easyblock'], name=ec.name)
            for klass in inspect.getmro(eb_class):
                if klass == EasyBlock:
                    break
                module = import_module(klass.__module__)
                context += read_file(module.__file__)
            
            # full module name (incl. hash) of all dependencies, including toolchain
            dep_mod_names = sorted([dep['full_mod_name'] for dep in ec.dependencies()])
            context += ','.join(dep_mod_names)

            # take into account that system toolchain has no module name (None value)
            context += ec.toolchain.mod_full_name or ''
            
            # *all* EasyBuild framework code
            easybuild_topdir = os.path.dirname(os.path.dirname(os.path.abspath(easybuild.framework.__file__)))
            for dirpath, _, filenames in sorted(os.walk(easybuild_topdir)):
                for fn in sorted(filenames):
                    if fn.endswith('.py'):
                        path = os.path.join(dirpath, fn)
                        context += read_file(path)
            
            # the full environment
            for key in sorted(os.environ):
                # EasyBuild sets $TMPDIR & co to a unique value on every run,
                # so we have to exclude this from the hash to avoid that it changes on every run...
                # we're taking the bold assumption here that different $TMPDIR paths don't affect the installation
                if key in ['TEMP', 'TEMPDIR', 'TMP', 'TMPDIR']:
                    continue
                context += os.environ[key]
            
            # timestamp that EasyBuild session was started (only if $EB_HASH_TIMESTAMP is set)
            if os.environ.get('EB_HASH_TIMESTAMP'):
                context += current_time.strftime("%Y%m%d%H%M%S")

            # hashlib.sha1 expects a bytestring in Python 3, so use .encode first in that case
            if sys.version_info >= (3,):
                context = context.encode()

            # compute SHA1 hash, compose hashed module name, and cache it
            sha1_hash = sha1(context).hexdigest()
            hashed_full_mod_name = full_mod_name + '-' + sha1_hash
            _cache[full_mod_name] = hashed_full_mod_name

        return hashed_full_mod_name

    def det_install_subdir(self, ec):
        """
        Determine name of software installation subdirectory of install path.
        """
        # use hashed full module name as name for install subdir
        return self.det_full_module_name(ec)

Reply via email to