On 03/06/2021 20:46, Agustín Aucar wrote:
Dear Kenneth,

Thank you so much for your kind reply.


El jue, 3 jun 2021 a las 15:18, Kenneth Hoste (<[email protected] <mailto:[email protected]>>) escribió:

    Dear Agustín,

    I'm not sure if there's an easy way to determine which library is
    causing the "Illegal instruction" error, but it's possibly not a single
    specific library, but several...

    I suggest you try re-installing all modules on the slave nodes (the
    oldest CPUs), if that's feasible.


I think the oldest CPUs are not those of the slave nodes but from the master.

Master node:

model name : Dual-Core AMD Opteron(tm) Processor 2214

Slaves:

model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

As far as I can see in this web site <http://cpuboss.com/cpus/Intel-Xeon-E5-2620-vs-AMD-Opteron-2214>, the AMD CPU (our master node) is older than the Intel ones (slaves). Am I wrong?


No, you're right, I overlooked that.

This probably means you're in trouble, in some sense...

The AMD processor supports instructions that the Intel one in the slaves doesn't support, and vice versa.

So building on the slaves with -march=native (which is what EasyBuild does by default) means the installations can only be used on the slaves.
And the same goes for the master...



    When you use "eb --force", only the easyconfig files specified to
    the eb
    command are reinstalled.
    There's no command line option to re-install everything, since it's
    pretty rare to actually having to do this.

    The easiest way would be to remove the module files, and then reinstall
    PySCF with "eb --robot".


OK. Then, I will remove all *.lua files (from the 36 modules, including *foss* and *GCCcore*), and then reinstall all of them but from a slave node.


I will report my results.

Thank you for your valuable advice!

Agustín

    regards,

    Kenneth

    On 03/06/2021 19:48, Agustín Aucar wrote:
     > Dear EasyBuild experts,
     >
     > I tried to recompile some of the dependencies of the PySCF code
    by using:
     >
     > eb name-of-file.eb --optarch=GENERIC -r --force
     >
     > but the results are still the same. I recompiled 5 or 6 of the 36
     > "dependent" modules... Is there a way to somehow estimate which
    module
     > is causing this problem to avoid recompiling each of the 36 modules?
     >
     > The loaded modules (module purge && module
     > load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are
     >
     > Currently Loaded Modules:
     >    1) compiler/GCCcore/10.2.0                  10)
     > lib/libevent/2.1.12-GCCcore-10.2.0   19) toolchain/foss/2020b
     >       28) lib/pybind11/2.6.0-GCCcore-10.2.0
     >    2) lib/zlib/1.2.11-GCCcore-10.2.0           11)
     > lib/UCX/1.9.0-GCCcore-10.2.0         20)
     > tools/bzip2/1.0.8-GCCcore-10.2.0    29)
    lang/SciPy-bundle/2020.11-foss-2020b
     >    3) tools/binutils/2.35-GCCcore-10.2.0       12)
     > lib/libfabric/1.11.0-GCCcore-10.2.0  21)
     > devel/ncurses/6.2-GCCcore-10.2.0    30)
    tools/Szip/2.1.1-GCCcore-10.2.0
     >    4) compiler/GCC/10.2.0                      13)
     > lib/PMIx/3.1.5-GCCcore-10.2.0        22)
     > lib/libreadline/8.0-GCCcore-10.2.0  31) data/HDF5/1.10.7-gompi-2020b
     >    5) tools/numactl/2.0.13-GCCcore-10.2.0      14)
     > mpi/OpenMPI/4.0.5-GCC-10.2.0         23)
    lang/Tcl/8.6.10-GCCcore-10.2.0
     >       32) data/h5py/3.1.0-foss-2020b
     >    6) tools/XZ/5.2.5-GCCcore-10.2.0            15)
     > numlib/OpenBLAS/0.3.12-GCC-10.2.0    24)
     > devel/SQLite/3.33.0-GCCcore-10.2.0  33)
     > chem/qcint/4.0.6-foss-2020b-Python-3.8.6
     >    7) lib/libxml2/2.9.10-GCCcore-10.2.0        16)
    toolchain/gompi/2020b
     >                 25) math/GMP/6.2.0-GCCcore-10.2.0       34)
     > chem/libxc/5.1.3-GCC-10.2.0
     >    8) system/libpciaccess/0.16-GCCcore-10.2.0  17)
     > numlib/FFTW/3.3.8-gompi-2020b        26)
    lib/libffi/3.3-GCCcore-10.2.0
     >      35) chem/XCFun/2.1.1-GCCcore-10.2.0
     >    9) system/hwloc/2.2.0-GCCcore-10.2.0        18)
     > numlib/ScaLAPACK/2.1.0-gompi-2020b   27)
     > lang/Python/3.8.6-GCCcore-10.2.0    36)
     > chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
     >
     >
     > Thank you in advance for any help,
     > Agustín
     >
     > El jue, 3 jun 2021 a las 8:03, Agustín Aucar
    (<[email protected] <mailto:[email protected]>
     > <mailto:[email protected] <mailto:[email protected]>>>) escribió:
     >
     >     Dear Åke and Kenneth,,
     >
     >     Thank you very much for your replies.
     >
     >     El jue, 3 jun 2021 a las 4:00, Kenneth Hoste
     >     (<[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>)
    escribió:
     >
     >         Dear Agustín,
     >
     >         The fundemental problem is indeed that you're building
    software
     >         on one
     >         type of CPU, and then trying to run it on another.
     >
     >         Can you share some more details on what type of CPU is in the
     >         master
     >         node and slave nodes?
     >
     >         If you can, try using the archspec tool (see
     > https://github.com/archspec/archspec
    <https://github.com/archspec/archspec>
     >         <https://github.com/archspec/archspec
    <https://github.com/archspec/archspec>>, install with "pip3 install
     >         archspec", then run "archspec cpu").
     >
     >         Or share the output of the following commands:
     >
     >         grep 'model name' /proc/cpuinfo  | head -1
     >
     >
     >         grep flags /proc/cpuinfo | head -1
     >
     >
     >     Master node:
     >
     >     model name : Dual-Core AMD Opteron(tm) Processor 2214
     >
     >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
     >     cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
     >     fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid
     >     pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
    3dnowprefetch vmmcall
     >
     >
     >     Slaves:
     >
     >     model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
     >
     >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
     >     cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
     >     syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
     >     rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni
    pclmulqdq
     >     dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16
    xtpr pdcm
     >     pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
     >     xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb
     >     cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp
     >     tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
     >     bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx
    smap
     >     intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total
    cqm_mbm_local
     >     dtherm ida arat pln pts md_clear flush_l1d
     >
     >         You can also try controlling the optimizations that EasyBuild
     >         does by
     >         default, to prevent that it builds for the specific CPU
    in the
     >         build
     >         node, using "eb --optarch=GENERIC", see
     >
    
https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
    
<https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html>
>  <https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html <https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html>>.
     >
     >
     >     I tried doing
     >
     >     eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC
    -r --force
     >
     >     but the problem is still the same. Maybe the problem is not
    in this
     >     particular code (PySCF) but in some of its dependencies. Is there
     >     something like a "--force" flag to force dependencies to
    recompile?
     >
     >         George's suggestion is better/easier though: building on the
     >         oldest node
     >         should help you too...
     >
     >
     >     I tried this a couple of days ago, but it didn't resolve the
     >     problem. In fact: when doing so, I cannot run the code in
    master (as
     >     expected) but I can neither run it in slaves...
     >
     >         regards,
     >
     >         Kenneth
     >
     >
     >
     >     Thank you for your help!
     >
     >     Agustín
     >
     >         On 02/06/2021 22:20, Agustín Aucar wrote:
     >          > Dear George,
     >          >
     >          > Thanks for your response. A few days ago, I tried to
    compile
     >         the code in
     >          > a slave node, but it didn't solve the problem...
     >          >
     >          > Best,
     >          > Agustín
     >          >
     >          > El mié, 2 jun 2021 a las 11:41, George Tsouloupas
     >          > (<[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>
     >         <mailto:[email protected]
    <mailto:[email protected]>
     >         <mailto:[email protected]
    <mailto:[email protected]>>>>) escribió:
     >          >
     >          >     Hi,
     >          >
     >          >     In a similar situation we ended up just building the
     >         software on the
     >          >     "older" cpu (i.e. the "slave" in your case)
     >          >
     >          >     G.
     >          >
     >          >
     >          >     George Tsouloupas, PhD
     >          >     HPC Facility Technical Director
     >          >     The Cyprus Institute
     >          >     tel: +357 22208688
     >          >
     >          >     On 6/2/21 4:22 PM, Agustín Aucar wrote:
     >          >>     Dear EasyBuild experts,
     >          >>
     >          >>     Firstly, thank you for your very nice work!
     >          >>
     >          >>     I'm trying to compile PySCF with the
    following *.eb file:
     >          >>
     >          >>     easyblock = 'CMakeMakeCp'
     >          >>
     >          >>     name = 'PySCF'
     >          >>     version = '2.0.0a'
     >          >>     versionsuffix = '-Python-%(pyver)s'
     >          >>
     >          >>     homepage = 'http://www.pyscf.org
    <http://www.pyscf.org> <http://www.pyscf.org <http://www.pyscf.org>>
     >         <http://www.pyscf.org/ <http://www.pyscf.org/>
    <http://www.pyscf.org/ <http://www.pyscf.org/>>>'
     >          >>     description = "PySCF is an open-source collection of
     >         electronic
     >          >>     structure modules powered by Python."
     >          >>
     >          >>     toolchain = {'name': 'foss', 'version': '2020b'}
     >          >>
     >          >>     source_urls =
    ['https://github.com/pyscf/pyscf/archive/
    <https://github.com/pyscf/pyscf/archive/>
     >         <https://github.com/pyscf/pyscf/archive/
    <https://github.com/pyscf/pyscf/archive/>>
     >          >>     <https://github.com/pyscf/pyscf/archive/
    <https://github.com/pyscf/pyscf/archive/>
     >         <https://github.com/pyscf/pyscf/archive/
    <https://github.com/pyscf/pyscf/archive/>>>']
     >          >>     sources = ['v%(version)s.tar.gz']
     >          >>     checksums =
     >          >>
>  ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']
     >          >>
     >          >>     builddependencies = [('CMake', '3.18.4')]
     >          >>
     >          >>     dependencies = [
     >          >>         ('Python', '3.8.6'),
     >          >>         ('SciPy-bundle', '2020.11'),  # for numpy, scipy
     >          >>         ('h5py', '3.1.0'),
     >          >>         ('qcint', '4.0.6', versionsuffix),
     >          >>         ('libxc', '5.1.3'),
     >          >>         ('XCFun', '2.1.1'),
     >          >>     ]
     >          >>
     >          >>     start_dir = 'pyscf/lib'
     >          >>
     >          >>     separate_build_dir = True
     >          >>
     >          >>     configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF
     >          >>     -DBUILD_XCFUN=OFF "
     >          >>
     >          >>     prebuildopts = "export
>          >>  PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "
     >          >>
     >          >>     files_to_copy = ['pyscf']
     >          >>
     >          >>     sanity_check_paths = {
     >          >>         'files': ['pyscf/__init__.py'],
     >          >>         'dirs': ['pyscf/data', 'pyscf/lib'],
     >          >>     }
     >          >>
     >          >>     sanity_check_commands = ["python -c 'import pyscf'"]
     >          >>
     >          >>     modextrapaths = {'PYTHONPATH': '',
    'PYSCF_EXT_PATH': ''}
     >          >>
     >          >>     moduleclass = 'chem'
     >          >>
     >          >>
     >          >>     Even if the module is created, I am having
    troubles by
     >         running it
     >          >>     in a node different from master. In particular,
    when I
     >         load the
     >          >>     module and ran the code, it goes all OK:
     >          >>
     >          >>     module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
     >          >>     python
     >          >>     from pyscf import gto, scf
     >          >>     mol = gto.M(atom='H 0 0 0; H 0 0 1')
     >          >>     mf = scf.RHF(mol).run()
     >          >>
     >          >>     but when I try to run it on a node different from the
     >         master, I get:
     >          >>
     >          >>     Python 3.8.6 (default, Jun  1 2021, 16:43:49)
     >          >>     [GCC 10.2.0] on linux
     >          >>     Type "help", "copyright", "credits" or "license" for
     >         more information.
     >          >>     >>> from pyscf import gto, scf
     >          >>     >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
     >          >>     >>> mf = scf.RHF(mol).run()
     >          >>     Illegal instruction (core dumped)
     >          >>
     >          >>     As far as I read in different places, it seems to be
     >         related to
     >          >>     the different architectures of our master and
    slaves nodes.
     >          >>
     >          >>     If I execute
     >          >>
     >          >>     grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr
     >         '[:upper:]'
     >          >>     '[:lower:]' | { read FLAGS; OPT="-march=native";
    for flag in
     >          >>     $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" |
    "ssse3"
     >         | "fma" |
     >          >>     "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";;
     >         esac; done;
     >          >>     MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
     >          >>
     >          >>     on the slaves I get: -march=native -mssse3 -mfma
    -mcx16
     >         -msse4.1
     >          >>     -msse4.2 -mpopcnt -mavx -mavx2
     >          >>
     >          >>     whereas on the master node we have: -march=native
    -mcx16
     >          >>
     >          >>     I tried to compile PySCF by adding these lines to my
     >         *.eb file:
     >          >>
     >          >>     configopts += "-DBUILD_FLAGS='-march=native -mssse3
     >         -mfma -mcx16
     >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
     >          >>     configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3
     >         -mfma -mcx16
     >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
     >          >>     configopts += "-DCMAKE_CXX_FLAGS='-march=native
    -mssse3
     >         -mfma
     >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
     >          >>     configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native
     >         -mssse3 -mfma
     >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"
     >          >>
     >          >>     but in that case the code does not run on master and
     >         neither in
     >          >>     slaves.
     >          >>
     >          >>
     >          >>     I'm sorry if it is a stupid question. I am far from
     >         being a system
     >          >>     admin...
     >          >>
     >          >>     Thanks a lot for your help.
     >          >>
     >          >>     Dr. Agustín Aucar
     >          >>     Institute for Modeling and Innovative Technologies -
     >         Argentina
     >          >
     >

Reply via email to