Dear Kenneth,

Thank you so much for your kind reply.


El jue, 3 jun 2021 a las 15:18, Kenneth Hoste (<[email protected]>)
escribió:

> Dear Agustín,
>
> I'm not sure if there's an easy way to determine which library is
> causing the "Illegal instruction" error, but it's possibly not a single
> specific library, but several...
>
> I suggest you try re-installing all modules on the slave nodes (the
> oldest CPUs), if that's feasible.
>

I think the oldest CPUs are not those of the slave nodes but from the
master.

Master node:

model name : Dual-Core AMD Opteron(tm) Processor 2214

Slaves:

model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

As far as I can see in this web site
<http://cpuboss.com/cpus/Intel-Xeon-E5-2620-vs-AMD-Opteron-2214>, the AMD
CPU (our master node) is older than the Intel ones (slaves). Am I wrong?

When you use "eb --force", only the easyconfig files specified to the eb
> command are reinstalled.
> There's no command line option to re-install everything, since it's
> pretty rare to actually having to do this.
>
> The easiest way would be to remove the module files, and then reinstall
> PySCF with "eb --robot".
>

OK. Then, I will remove all *.lua files (from the 36 modules, including
*foss* and *GCCcore*), and then reinstall all of them but from a slave node.


I will report my results.

Thank you for your valuable advice!

Agustín


> regards,
>
> Kenneth
>
> On 03/06/2021 19:48, Agustín Aucar wrote:
> > Dear EasyBuild experts,
> >
> > I tried to recompile some of the dependencies of the PySCF code by using:
> >
> > eb name-of-file.eb --optarch=GENERIC -r --force
> >
> > but the results are still the same. I recompiled 5 or 6 of the 36
> > "dependent" modules... Is there a way to somehow estimate which module
> > is causing this problem to avoid recompiling each of the 36 modules?
> >
> > The loaded modules (module purge && module
> > load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are
> >
> > Currently Loaded Modules:
> >    1) compiler/GCCcore/10.2.0                  10)
> > lib/libevent/2.1.12-GCCcore-10.2.0   19) toolchain/foss/2020b
> >       28) lib/pybind11/2.6.0-GCCcore-10.2.0
> >    2) lib/zlib/1.2.11-GCCcore-10.2.0           11)
> > lib/UCX/1.9.0-GCCcore-10.2.0         20)
> > tools/bzip2/1.0.8-GCCcore-10.2.0    29)
> lang/SciPy-bundle/2020.11-foss-2020b
> >    3) tools/binutils/2.35-GCCcore-10.2.0       12)
> > lib/libfabric/1.11.0-GCCcore-10.2.0  21)
> > devel/ncurses/6.2-GCCcore-10.2.0    30) tools/Szip/2.1.1-GCCcore-10.2.0
> >    4) compiler/GCC/10.2.0                      13)
> > lib/PMIx/3.1.5-GCCcore-10.2.0        22)
> > lib/libreadline/8.0-GCCcore-10.2.0  31) data/HDF5/1.10.7-gompi-2020b
> >    5) tools/numactl/2.0.13-GCCcore-10.2.0      14)
> > mpi/OpenMPI/4.0.5-GCC-10.2.0         23) lang/Tcl/8.6.10-GCCcore-10.2.0
> >       32) data/h5py/3.1.0-foss-2020b
> >    6) tools/XZ/5.2.5-GCCcore-10.2.0            15)
> > numlib/OpenBLAS/0.3.12-GCC-10.2.0    24)
> > devel/SQLite/3.33.0-GCCcore-10.2.0  33)
> > chem/qcint/4.0.6-foss-2020b-Python-3.8.6
> >    7) lib/libxml2/2.9.10-GCCcore-10.2.0        16) toolchain/gompi/2020b
> >                 25) math/GMP/6.2.0-GCCcore-10.2.0       34)
> > chem/libxc/5.1.3-GCC-10.2.0
> >    8) system/libpciaccess/0.16-GCCcore-10.2.0  17)
> > numlib/FFTW/3.3.8-gompi-2020b        26) lib/libffi/3.3-GCCcore-10.2.0
> >      35) chem/XCFun/2.1.1-GCCcore-10.2.0
> >    9) system/hwloc/2.2.0-GCCcore-10.2.0        18)
> > numlib/ScaLAPACK/2.1.0-gompi-2020b   27)
> > lang/Python/3.8.6-GCCcore-10.2.0    36)
> > chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
> >
> >
> > Thank you in advance for any help,
> > Agustín
> >
> > El jue, 3 jun 2021 a las 8:03, Agustín Aucar (<[email protected]
> > <mailto:[email protected]>>) escribió:
> >
> >     Dear Åke and Kenneth,,
> >
> >     Thank you very much for your replies.
> >
> >     El jue, 3 jun 2021 a las 4:00, Kenneth Hoste
> >     (<[email protected] <mailto:[email protected]>>) escribió:
> >
> >         Dear Agustín,
> >
> >         The fundemental problem is indeed that you're building software
> >         on one
> >         type of CPU, and then trying to run it on another.
> >
> >         Can you share some more details on what type of CPU is in the
> >         master
> >         node and slave nodes?
> >
> >         If you can, try using the archspec tool (see
> >         https://github.com/archspec/archspec
> >         <https://github.com/archspec/archspec>, install with "pip3
> install
> >         archspec", then run "archspec cpu").
> >
> >         Or share the output of the following commands:
> >
> >         grep 'model name' /proc/cpuinfo  | head -1
> >
> >
> >         grep flags /proc/cpuinfo | head -1
> >
> >
> >     Master node:
> >
> >     model name : Dual-Core AMD Opteron(tm) Processor 2214
> >
> >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> >     cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> >     fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid
> >     pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch
> vmmcall
> >
> >
> >     Slaves:
> >
> >     model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> >
> >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> >     cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> >     syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
> >     rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> >     dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
> >     pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> >     xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb
> >     cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp
> >     tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> >     bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap
> >     intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
> >     dtherm ida arat pln pts md_clear flush_l1d
> >
> >         You can also try controlling the optimizations that EasyBuild
> >         does by
> >         default, to prevent that it builds for the specific CPU in the
> >         build
> >         node, using "eb --optarch=GENERIC", see
> >
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> >         <
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> >.
> >
> >
> >     I tried doing
> >
> >     eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC -r
> --force
> >
> >     but the problem is still the same. Maybe the problem is not in this
> >     particular code (PySCF) but in some of its dependencies. Is there
> >     something like a "--force" flag to force dependencies to recompile?
> >
> >         George's suggestion is better/easier though: building on the
> >         oldest node
> >         should help you too...
> >
> >
> >     I tried this a couple of days ago, but it didn't resolve the
> >     problem. In fact: when doing so, I cannot run the code in master (as
> >     expected) but I can neither run it in slaves...
> >
> >         regards,
> >
> >         Kenneth
> >
> >
> >
> >     Thank you for your help!
> >
> >     Agustín
> >
> >         On 02/06/2021 22:20, Agustín Aucar wrote:
> >          > Dear George,
> >          >
> >          > Thanks for your response. A few days ago, I tried to compile
> >         the code in
> >          > a slave node, but it didn't solve the problem...
> >          >
> >          > Best,
> >          > Agustín
> >          >
> >          > El mié, 2 jun 2021 a las 11:41, George Tsouloupas
> >          > (<[email protected] <mailto:[email protected]>
> >         <mailto:[email protected]
> >         <mailto:[email protected]>>>) escribió:
> >          >
> >          >     Hi,
> >          >
> >          >     In a similar situation we ended up just building the
> >         software on the
> >          >     "older" cpu (i.e. the "slave" in your case)
> >          >
> >          >     G.
> >          >
> >          >
> >          >     George Tsouloupas, PhD
> >          >     HPC Facility Technical Director
> >          >     The Cyprus Institute
> >          >     tel: +357 22208688
> >          >
> >          >     On 6/2/21 4:22 PM, Agustín Aucar wrote:
> >          >>     Dear EasyBuild experts,
> >          >>
> >          >>     Firstly, thank you for your very nice work!
> >          >>
> >          >>     I'm trying to compile PySCF with the following *.eb file:
> >          >>
> >          >>     easyblock = 'CMakeMakeCp'
> >          >>
> >          >>     name = 'PySCF'
> >          >>     version = '2.0.0a'
> >          >>     versionsuffix = '-Python-%(pyver)s'
> >          >>
> >          >>     homepage = 'http://www.pyscf.org <http://www.pyscf.org>
> >         <http://www.pyscf.org/ <http://www.pyscf.org/>>'
> >          >>     description = "PySCF is an open-source collection of
> >         electronic
> >          >>     structure modules powered by Python."
> >          >>
> >          >>     toolchain = {'name': 'foss', 'version': '2020b'}
> >          >>
> >          >>     source_urls = ['https://github.com/pyscf/pyscf/archive/
> >         <https://github.com/pyscf/pyscf/archive/>
> >          >>     <https://github.com/pyscf/pyscf/archive/
> >         <https://github.com/pyscf/pyscf/archive/>>']
> >          >>     sources = ['v%(version)s.tar.gz']
> >          >>     checksums =
> >          >>
> >
>  ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']
> >          >>
> >          >>     builddependencies = [('CMake', '3.18.4')]
> >          >>
> >          >>     dependencies = [
> >          >>         ('Python', '3.8.6'),
> >          >>         ('SciPy-bundle', '2020.11'),  # for numpy, scipy
> >          >>         ('h5py', '3.1.0'),
> >          >>         ('qcint', '4.0.6', versionsuffix),
> >          >>         ('libxc', '5.1.3'),
> >          >>         ('XCFun', '2.1.1'),
> >          >>     ]
> >          >>
> >          >>     start_dir = 'pyscf/lib'
> >          >>
> >          >>     separate_build_dir = True
> >          >>
> >          >>     configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF
> >          >>     -DBUILD_XCFUN=OFF "
> >          >>
> >          >>     prebuildopts = "export
> >          >>     PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "
> >          >>
> >          >>     files_to_copy = ['pyscf']
> >          >>
> >          >>     sanity_check_paths = {
> >          >>         'files': ['pyscf/__init__.py'],
> >          >>         'dirs': ['pyscf/data', 'pyscf/lib'],
> >          >>     }
> >          >>
> >          >>     sanity_check_commands = ["python -c 'import pyscf'"]
> >          >>
> >          >>     modextrapaths = {'PYTHONPATH': '', 'PYSCF_EXT_PATH': ''}
> >          >>
> >          >>     moduleclass = 'chem'
> >          >>
> >          >>
> >          >>     Even if the module is created, I am having troubles by
> >         running it
> >          >>     in a node different from master. In particular, when I
> >         load the
> >          >>     module and ran the code, it goes all OK:
> >          >>
> >          >>     module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
> >          >>     python
> >          >>     from pyscf import gto, scf
> >          >>     mol = gto.M(atom='H 0 0 0; H 0 0 1')
> >          >>     mf = scf.RHF(mol).run()
> >          >>
> >          >>     but when I try to run it on a node different from the
> >         master, I get:
> >          >>
> >          >>     Python 3.8.6 (default, Jun  1 2021, 16:43:49)
> >          >>     [GCC 10.2.0] on linux
> >          >>     Type "help", "copyright", "credits" or "license" for
> >         more information.
> >          >>     >>> from pyscf import gto, scf
> >          >>     >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
> >          >>     >>> mf = scf.RHF(mol).run()
> >          >>     Illegal instruction (core dumped)
> >          >>
> >          >>     As far as I read in different places, it seems to be
> >         related to
> >          >>     the different architectures of our master and slaves
> nodes.
> >          >>
> >          >>     If I execute
> >          >>
> >          >>     grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr
> >         '[:upper:]'
> >          >>     '[:lower:]' | { read FLAGS; OPT="-march=native"; for
> flag in
> >          >>     $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3"
> >         | "fma" |
> >          >>     "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";;
> >         esac; done;
> >          >>     MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
> >          >>
> >          >>     on the slaves I get: -march=native -mssse3 -mfma -mcx16
> >         -msse4.1
> >          >>     -msse4.2 -mpopcnt -mavx -mavx2
> >          >>
> >          >>     whereas on the master node we have: -march=native -mcx16
> >          >>
> >          >>     I tried to compile PySCF by adding these lines to my
> >         *.eb file:
> >          >>
> >          >>     configopts += "-DBUILD_FLAGS='-march=native -mssse3
> >         -mfma -mcx16
> >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >          >>     configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3
> >         -mfma -mcx16
> >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >          >>     configopts += "-DCMAKE_CXX_FLAGS='-march=native -mssse3
> >         -mfma
> >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >          >>     configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native
> >         -mssse3 -mfma
> >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"
> >          >>
> >          >>     but in that case the code does not run on master and
> >         neither in
> >          >>     slaves.
> >          >>
> >          >>
> >          >>     I'm sorry if it is a stupid question. I am far from
> >         being a system
> >          >>     admin...
> >          >>
> >          >>     Thanks a lot for your help.
> >          >>
> >          >>     Dr. Agustín Aucar
> >          >>     Institute for Modeling and Innovative Technologies -
> >         Argentina
> >          >
> >
>

Reply via email to