Dear Kenneth, Thank you so much for your kind reply.
El jue, 3 jun 2021 a las 15:18, Kenneth Hoste (<[email protected]>) escribió: > Dear Agustín, > > I'm not sure if there's an easy way to determine which library is > causing the "Illegal instruction" error, but it's possibly not a single > specific library, but several... > > I suggest you try re-installing all modules on the slave nodes (the > oldest CPUs), if that's feasible. > I think the oldest CPUs are not those of the slave nodes but from the master. Master node: model name : Dual-Core AMD Opteron(tm) Processor 2214 Slaves: model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz As far as I can see in this web site <http://cpuboss.com/cpus/Intel-Xeon-E5-2620-vs-AMD-Opteron-2214>, the AMD CPU (our master node) is older than the Intel ones (slaves). Am I wrong? When you use "eb --force", only the easyconfig files specified to the eb > command are reinstalled. > There's no command line option to re-install everything, since it's > pretty rare to actually having to do this. > > The easiest way would be to remove the module files, and then reinstall > PySCF with "eb --robot". > OK. Then, I will remove all *.lua files (from the 36 modules, including *foss* and *GCCcore*), and then reinstall all of them but from a slave node. I will report my results. Thank you for your valuable advice! Agustín > regards, > > Kenneth > > On 03/06/2021 19:48, Agustín Aucar wrote: > > Dear EasyBuild experts, > > > > I tried to recompile some of the dependencies of the PySCF code by using: > > > > eb name-of-file.eb --optarch=GENERIC -r --force > > > > but the results are still the same. I recompiled 5 or 6 of the 36 > > "dependent" modules... Is there a way to somehow estimate which module > > is causing this problem to avoid recompiling each of the 36 modules? > > > > The loaded modules (module purge && module > > load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are > > > > Currently Loaded Modules: > > 1) compiler/GCCcore/10.2.0 10) > > lib/libevent/2.1.12-GCCcore-10.2.0 19) toolchain/foss/2020b > > 28) lib/pybind11/2.6.0-GCCcore-10.2.0 > > 2) lib/zlib/1.2.11-GCCcore-10.2.0 11) > > lib/UCX/1.9.0-GCCcore-10.2.0 20) > > tools/bzip2/1.0.8-GCCcore-10.2.0 29) > lang/SciPy-bundle/2020.11-foss-2020b > > 3) tools/binutils/2.35-GCCcore-10.2.0 12) > > lib/libfabric/1.11.0-GCCcore-10.2.0 21) > > devel/ncurses/6.2-GCCcore-10.2.0 30) tools/Szip/2.1.1-GCCcore-10.2.0 > > 4) compiler/GCC/10.2.0 13) > > lib/PMIx/3.1.5-GCCcore-10.2.0 22) > > lib/libreadline/8.0-GCCcore-10.2.0 31) data/HDF5/1.10.7-gompi-2020b > > 5) tools/numactl/2.0.13-GCCcore-10.2.0 14) > > mpi/OpenMPI/4.0.5-GCC-10.2.0 23) lang/Tcl/8.6.10-GCCcore-10.2.0 > > 32) data/h5py/3.1.0-foss-2020b > > 6) tools/XZ/5.2.5-GCCcore-10.2.0 15) > > numlib/OpenBLAS/0.3.12-GCC-10.2.0 24) > > devel/SQLite/3.33.0-GCCcore-10.2.0 33) > > chem/qcint/4.0.6-foss-2020b-Python-3.8.6 > > 7) lib/libxml2/2.9.10-GCCcore-10.2.0 16) toolchain/gompi/2020b > > 25) math/GMP/6.2.0-GCCcore-10.2.0 34) > > chem/libxc/5.1.3-GCC-10.2.0 > > 8) system/libpciaccess/0.16-GCCcore-10.2.0 17) > > numlib/FFTW/3.3.8-gompi-2020b 26) lib/libffi/3.3-GCCcore-10.2.0 > > 35) chem/XCFun/2.1.1-GCCcore-10.2.0 > > 9) system/hwloc/2.2.0-GCCcore-10.2.0 18) > > numlib/ScaLAPACK/2.1.0-gompi-2020b 27) > > lang/Python/3.8.6-GCCcore-10.2.0 36) > > chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6 > > > > > > Thank you in advance for any help, > > Agustín > > > > El jue, 3 jun 2021 a las 8:03, Agustín Aucar (<[email protected] > > <mailto:[email protected]>>) escribió: > > > > Dear Åke and Kenneth,, > > > > Thank you very much for your replies. > > > > El jue, 3 jun 2021 a las 4:00, Kenneth Hoste > > (<[email protected] <mailto:[email protected]>>) escribió: > > > > Dear Agustín, > > > > The fundemental problem is indeed that you're building software > > on one > > type of CPU, and then trying to run it on another. > > > > Can you share some more details on what type of CPU is in the > > master > > node and slave nodes? > > > > If you can, try using the archspec tool (see > > https://github.com/archspec/archspec > > <https://github.com/archspec/archspec>, install with "pip3 > install > > archspec", then run "archspec cpu"). > > > > Or share the output of the following commands: > > > > grep 'model name' /proc/cpuinfo | head -1 > > > > > > grep flags /proc/cpuinfo | head -1 > > > > > > Master node: > > > > model name : Dual-Core AMD Opteron(tm) Processor 2214 > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext > > fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid > > pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch > vmmcall > > > > > > Slaves: > > > > model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > > syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts > > rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq > > dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm > > pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes > > xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb > > cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp > > tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust > > bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap > > intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local > > dtherm ida arat pln pts md_clear flush_l1d > > > > You can also try controlling the optimizations that EasyBuild > > does by > > default, to prevent that it builds for the specific CPU in the > > build > > node, using "eb --optarch=GENERIC", see > > > https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html > > < > https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html > >. > > > > > > I tried doing > > > > eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC -r > --force > > > > but the problem is still the same. Maybe the problem is not in this > > particular code (PySCF) but in some of its dependencies. Is there > > something like a "--force" flag to force dependencies to recompile? > > > > George's suggestion is better/easier though: building on the > > oldest node > > should help you too... > > > > > > I tried this a couple of days ago, but it didn't resolve the > > problem. In fact: when doing so, I cannot run the code in master (as > > expected) but I can neither run it in slaves... > > > > regards, > > > > Kenneth > > > > > > > > Thank you for your help! > > > > Agustín > > > > On 02/06/2021 22:20, Agustín Aucar wrote: > > > Dear George, > > > > > > Thanks for your response. A few days ago, I tried to compile > > the code in > > > a slave node, but it didn't solve the problem... > > > > > > Best, > > > Agustín > > > > > > El mié, 2 jun 2021 a las 11:41, George Tsouloupas > > > (<[email protected] <mailto:[email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>>>) escribió: > > > > > > Hi, > > > > > > In a similar situation we ended up just building the > > software on the > > > "older" cpu (i.e. the "slave" in your case) > > > > > > G. > > > > > > > > > George Tsouloupas, PhD > > > HPC Facility Technical Director > > > The Cyprus Institute > > > tel: +357 22208688 > > > > > > On 6/2/21 4:22 PM, Agustín Aucar wrote: > > >> Dear EasyBuild experts, > > >> > > >> Firstly, thank you for your very nice work! > > >> > > >> I'm trying to compile PySCF with the following *.eb file: > > >> > > >> easyblock = 'CMakeMakeCp' > > >> > > >> name = 'PySCF' > > >> version = '2.0.0a' > > >> versionsuffix = '-Python-%(pyver)s' > > >> > > >> homepage = 'http://www.pyscf.org <http://www.pyscf.org> > > <http://www.pyscf.org/ <http://www.pyscf.org/>>' > > >> description = "PySCF is an open-source collection of > > electronic > > >> structure modules powered by Python." > > >> > > >> toolchain = {'name': 'foss', 'version': '2020b'} > > >> > > >> source_urls = ['https://github.com/pyscf/pyscf/archive/ > > <https://github.com/pyscf/pyscf/archive/> > > >> <https://github.com/pyscf/pyscf/archive/ > > <https://github.com/pyscf/pyscf/archive/>>'] > > >> sources = ['v%(version)s.tar.gz'] > > >> checksums = > > >> > > > ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922'] > > >> > > >> builddependencies = [('CMake', '3.18.4')] > > >> > > >> dependencies = [ > > >> ('Python', '3.8.6'), > > >> ('SciPy-bundle', '2020.11'), # for numpy, scipy > > >> ('h5py', '3.1.0'), > > >> ('qcint', '4.0.6', versionsuffix), > > >> ('libxc', '5.1.3'), > > >> ('XCFun', '2.1.1'), > > >> ] > > >> > > >> start_dir = 'pyscf/lib' > > >> > > >> separate_build_dir = True > > >> > > >> configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF > > >> -DBUILD_XCFUN=OFF " > > >> > > >> prebuildopts = "export > > >> PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && " > > >> > > >> files_to_copy = ['pyscf'] > > >> > > >> sanity_check_paths = { > > >> 'files': ['pyscf/__init__.py'], > > >> 'dirs': ['pyscf/data', 'pyscf/lib'], > > >> } > > >> > > >> sanity_check_commands = ["python -c 'import pyscf'"] > > >> > > >> modextrapaths = {'PYTHONPATH': '', 'PYSCF_EXT_PATH': ''} > > >> > > >> moduleclass = 'chem' > > >> > > >> > > >> Even if the module is created, I am having troubles by > > running it > > >> in a node different from master. In particular, when I > > load the > > >> module and ran the code, it goes all OK: > > >> > > >> module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6 > > >> python > > >> from pyscf import gto, scf > > >> mol = gto.M(atom='H 0 0 0; H 0 0 1') > > >> mf = scf.RHF(mol).run() > > >> > > >> but when I try to run it on a node different from the > > master, I get: > > >> > > >> Python 3.8.6 (default, Jun 1 2021, 16:43:49) > > >> [GCC 10.2.0] on linux > > >> Type "help", "copyright", "credits" or "license" for > > more information. > > >> >>> from pyscf import gto, scf > > >> >>> mol = gto.M(atom='H 0 0 0; H 0 0 1') > > >> >>> mf = scf.RHF(mol).run() > > >> Illegal instruction (core dumped) > > >> > > >> As far as I read in different places, it seems to be > > related to > > >> the different architectures of our master and slaves > nodes. > > >> > > >> If I execute > > >> > > >> grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr > > '[:upper:]' > > >> '[:lower:]' | { read FLAGS; OPT="-march=native"; for > flag in > > >> $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" > > | "fma" | > > >> "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; > > esac; done; > > >> MODOPT=${OPT//_/\.}; echo "$MODOPT"; } > > >> > > >> on the slaves I get: -march=native -mssse3 -mfma -mcx16 > > -msse4.1 > > >> -msse4.2 -mpopcnt -mavx -mavx2 > > >> > > >> whereas on the master node we have: -march=native -mcx16 > > >> > > >> I tried to compile PySCF by adding these lines to my > > *.eb file: > > >> > > >> configopts += "-DBUILD_FLAGS='-march=native -mssse3 > > -mfma -mcx16 > > >> -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' " > > >> configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3 > > -mfma -mcx16 > > >> -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' " > > >> configopts += "-DCMAKE_CXX_FLAGS='-march=native -mssse3 > > -mfma > > >> -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' " > > >> configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native > > -mssse3 -mfma > > >> -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'" > > >> > > >> but in that case the code does not run on master and > > neither in > > >> slaves. > > >> > > >> > > >> I'm sorry if it is a stupid question. I am far from > > being a system > > >> admin... > > >> > > >> Thanks a lot for your help. > > >> > > >> Dr. Agustín Aucar > > >> Institute for Modeling and Innovative Technologies - > > Argentina > > > > > >

