Dear Agustín,

The fundemental problem is indeed that you're building software on one type of CPU, and then trying to run it on another.

Can you share some more details on what type of CPU is in the master node and slave nodes?

If you can, try using the archspec tool (see https://github.com/archspec/archspec, install with "pip3 install archspec", then run "archspec cpu").

Or share the output of the following commands:

grep 'model name' /proc/cpuinfo  | head -1

grep flags /proc/cpuinfo | head -1


You can also try controlling the optimizations that EasyBuild does by default, to prevent that it builds for the specific CPU in the build node, using "eb --optarch=GENERIC", see https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html.

George's suggestion is better/easier though: building on the oldest node should help you too...


regards,

Kenneth

On 02/06/2021 22:20, Agustín Aucar wrote:
Dear George,

Thanks for your response. A few days ago, I tried to compile the code in a slave node, but it didn't solve the problem...

Best,
Agustín

El mié, 2 jun 2021 a las 11:41, George Tsouloupas (<[email protected] <mailto:[email protected]>>) escribió:

    Hi,

    In a similar situation we ended up just building the software on the
    "older" cpu (i.e. the "slave" in your case)

    G.


    George Tsouloupas, PhD
    HPC Facility Technical Director
    The Cyprus Institute
    tel: +357 22208688

    On 6/2/21 4:22 PM, Agustín Aucar wrote:
    Dear EasyBuild experts,

    Firstly, thank you for your very nice work!

    I'm trying to compile PySCF with the following *.eb file:

    easyblock = 'CMakeMakeCp'

    name = 'PySCF'
    version = '2.0.0a'
    versionsuffix = '-Python-%(pyver)s'

    homepage = 'http://www.pyscf.org <http://www.pyscf.org/>'
    description = "PySCF is an open-source collection of electronic
    structure modules powered by Python."

    toolchain = {'name': 'foss', 'version': '2020b'}

    source_urls = ['https://github.com/pyscf/pyscf/archive/
    <https://github.com/pyscf/pyscf/archive/>']
    sources = ['v%(version)s.tar.gz']
    checksums =
    ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']

    builddependencies = [('CMake', '3.18.4')]

    dependencies = [
        ('Python', '3.8.6'),
        ('SciPy-bundle', '2020.11'),  # for numpy, scipy
        ('h5py', '3.1.0'),
        ('qcint', '4.0.6', versionsuffix),
        ('libxc', '5.1.3'),
        ('XCFun', '2.1.1'),
    ]

    start_dir = 'pyscf/lib'

    separate_build_dir = True

    configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF
    -DBUILD_XCFUN=OFF "

    prebuildopts = "export
    PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "

    files_to_copy = ['pyscf']

    sanity_check_paths = {
        'files': ['pyscf/__init__.py'],
        'dirs': ['pyscf/data', 'pyscf/lib'],
    }

    sanity_check_commands = ["python -c 'import pyscf'"]

    modextrapaths = {'PYTHONPATH': '', 'PYSCF_EXT_PATH': ''}

    moduleclass = 'chem'


    Even if the module is created, I am having troubles by running it
    in a node different from master. In particular, when I load the
    module and ran the code, it goes all OK:

    module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
    python
    from pyscf import gto, scf
    mol = gto.M(atom='H 0 0 0; H 0 0 1')
    mf = scf.RHF(mol).run()

    but when I try to run it on a node different from the master, I get:

    Python 3.8.6 (default, Jun  1 2021, 16:43:49)
    [GCC 10.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from pyscf import gto, scf
    >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
    >>> mf = scf.RHF(mol).run()
    Illegal instruction (core dumped)

    As far as I read in different places, it seems to be related to
    the different architectures of our master and slaves nodes.

    If I execute

    grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]'
    '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in
    $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" |
    "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done;
    MODOPT=${OPT//_/\.}; echo "$MODOPT"; }

    on the slaves I get: -march=native -mssse3 -mfma -mcx16 -msse4.1
    -msse4.2 -mpopcnt -mavx -mavx2

    whereas on the master node we have: -march=native -mcx16

    I tried to compile PySCF by adding these lines to my *.eb file:

    configopts += "-DBUILD_FLAGS='-march=native -mssse3 -mfma -mcx16
    -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
    configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3 -mfma -mcx16
    -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
    configopts += "-DCMAKE_CXX_FLAGS='-march=native -mssse3 -mfma
    -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
    configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native -mssse3 -mfma
    -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"

    but in that case the code does not run on master and neither in
    slaves.


    I'm sorry if it is a stupid question. I am far from being a system
    admin...

    Thanks a lot for your help.

    Dr. Agustín Aucar
    Institute for Modeling and Innovative Technologies - Argentina

Reply via email to