Dear EasyBuild experts,

I tried to recompile some of the dependencies of the PySCF code by using:

eb name-of-file.eb --optarch=GENERIC -r --force

but the results are still the same. I recompiled 5 or 6 of the 36
"dependent" modules... Is there a way to somehow estimate which module is
causing this problem to avoid recompiling each of the 36 modules?

The loaded modules (module purge && module
load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are

Currently Loaded Modules:
  1) compiler/GCCcore/10.2.0                  10)
lib/libevent/2.1.12-GCCcore-10.2.0   19) toolchain/foss/2020b
 28) lib/pybind11/2.6.0-GCCcore-10.2.0
  2) lib/zlib/1.2.11-GCCcore-10.2.0           11)
lib/UCX/1.9.0-GCCcore-10.2.0         20) tools/bzip2/1.0.8-GCCcore-10.2.0
 29) lang/SciPy-bundle/2020.11-foss-2020b
  3) tools/binutils/2.35-GCCcore-10.2.0       12)
lib/libfabric/1.11.0-GCCcore-10.2.0  21) devel/ncurses/6.2-GCCcore-10.2.0
 30) tools/Szip/2.1.1-GCCcore-10.2.0
  4) compiler/GCC/10.2.0                      13)
lib/PMIx/3.1.5-GCCcore-10.2.0        22) lib/libreadline/8.0-GCCcore-10.2.0
 31) data/HDF5/1.10.7-gompi-2020b
  5) tools/numactl/2.0.13-GCCcore-10.2.0      14)
mpi/OpenMPI/4.0.5-GCC-10.2.0         23) lang/Tcl/8.6.10-GCCcore-10.2.0
 32) data/h5py/3.1.0-foss-2020b
  6) tools/XZ/5.2.5-GCCcore-10.2.0            15)
numlib/OpenBLAS/0.3.12-GCC-10.2.0    24) devel/SQLite/3.33.0-GCCcore-10.2.0
 33) chem/qcint/4.0.6-foss-2020b-Python-3.8.6
  7) lib/libxml2/2.9.10-GCCcore-10.2.0        16) toolchain/gompi/2020b
           25) math/GMP/6.2.0-GCCcore-10.2.0       34)
chem/libxc/5.1.3-GCC-10.2.0
  8) system/libpciaccess/0.16-GCCcore-10.2.0  17)
numlib/FFTW/3.3.8-gompi-2020b        26) lib/libffi/3.3-GCCcore-10.2.0
  35) chem/XCFun/2.1.1-GCCcore-10.2.0
  9) system/hwloc/2.2.0-GCCcore-10.2.0        18)
numlib/ScaLAPACK/2.1.0-gompi-2020b   27) lang/Python/3.8.6-GCCcore-10.2.0
 36) chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6


Thank you in advance for any help,
Agustín

El jue, 3 jun 2021 a las 8:03, Agustín Aucar (<[email protected]>)
escribió:

> Dear Åke and Kenneth,,
>
> Thank you very much for your replies.
>
> El jue, 3 jun 2021 a las 4:00, Kenneth Hoste (<[email protected]>)
> escribió:
>
>> Dear Agustín,
>>
>> The fundemental problem is indeed that you're building software on one
>> type of CPU, and then trying to run it on another.
>>
>> Can you share some more details on what type of CPU is in the master
>> node and slave nodes?
>>
>> If you can, try using the archspec tool (see
>> https://github.com/archspec/archspec, install with "pip3 install
>> archspec", then run "archspec cpu").
>>
>> Or share the output of the following commands:
>>
>> grep 'model name' /proc/cpuinfo  | head -1
>
>
>> grep flags /proc/cpuinfo | head -1
>>
>
> Master node:
>
> model name : Dual-Core AMD Opteron(tm) Processor 2214
>
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
> 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy
> svm extapic cr8_legacy 3dnowprefetch vmmcall
>
>
> Slaves:
>
> model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
>
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin
> ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase
> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx
> smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
> dtherm ida arat pln pts md_clear flush_l1d
>
>
> You can also try controlling the optimizations that EasyBuild does by
>> default, to prevent that it builds for the specific CPU in the build
>> node, using "eb --optarch=GENERIC", see
>>
>> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
>> .
>>
>
> I tried doing
>
> eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC -r --force
>
> but the problem is still the same. Maybe the problem is not in this
> particular code (PySCF) but in some of its dependencies. Is there something
> like a "--force" flag to force dependencies to recompile?
>
>
>
>> George's suggestion is better/easier though: building on the oldest node
>> should help you too...
>>
>
> I tried this a couple of days ago, but it didn't resolve the problem. In
> fact: when doing so, I cannot run the code in master (as expected) but I
> can neither run it in slaves...
>
> regards,
>>
>> Kenneth
>>
>
>
> Thank you for your help!
>
> Agustín
>
>
>
>> On 02/06/2021 22:20, Agustín Aucar wrote:
>> > Dear George,
>> >
>> > Thanks for your response. A few days ago, I tried to compile the code
>> in
>> > a slave node, but it didn't solve the problem...
>> >
>> > Best,
>> > Agustín
>> >
>> > El mié, 2 jun 2021 a las 11:41, George Tsouloupas
>> > (<[email protected] <mailto:[email protected]>>) escribió:
>> >
>> >     Hi,
>> >
>> >     In a similar situation we ended up just building the software on the
>> >     "older" cpu (i.e. the "slave" in your case)
>> >
>> >     G.
>> >
>> >
>> >     George Tsouloupas, PhD
>> >     HPC Facility Technical Director
>> >     The Cyprus Institute
>> >     tel: +357 22208688
>> >
>> >     On 6/2/21 4:22 PM, Agustín Aucar wrote:
>> >>     Dear EasyBuild experts,
>> >>
>> >>     Firstly, thank you for your very nice work!
>> >>
>> >>     I'm trying to compile PySCF with the following *.eb file:
>> >>
>> >>     easyblock = 'CMakeMakeCp'
>> >>
>> >>     name = 'PySCF'
>> >>     version = '2.0.0a'
>> >>     versionsuffix = '-Python-%(pyver)s'
>> >>
>> >>     homepage = 'http://www.pyscf.org <http://www.pyscf.org/>'
>> >>     description = "PySCF is an open-source collection of electronic
>> >>     structure modules powered by Python."
>> >>
>> >>     toolchain = {'name': 'foss', 'version': '2020b'}
>> >>
>> >>     source_urls = ['https://github.com/pyscf/pyscf/archive/
>> >>     <https://github.com/pyscf/pyscf/archive/>']
>> >>     sources = ['v%(version)s.tar.gz']
>> >>     checksums =
>> >>
>>  ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']
>> >>
>> >>     builddependencies = [('CMake', '3.18.4')]
>> >>
>> >>     dependencies = [
>> >>         ('Python', '3.8.6'),
>> >>         ('SciPy-bundle', '2020.11'),  # for numpy, scipy
>> >>         ('h5py', '3.1.0'),
>> >>         ('qcint', '4.0.6', versionsuffix),
>> >>         ('libxc', '5.1.3'),
>> >>         ('XCFun', '2.1.1'),
>> >>     ]
>> >>
>> >>     start_dir = 'pyscf/lib'
>> >>
>> >>     separate_build_dir = True
>> >>
>> >>     configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF
>> >>     -DBUILD_XCFUN=OFF "
>> >>
>> >>     prebuildopts = "export
>> >>     PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "
>> >>
>> >>     files_to_copy = ['pyscf']
>> >>
>> >>     sanity_check_paths = {
>> >>         'files': ['pyscf/__init__.py'],
>> >>         'dirs': ['pyscf/data', 'pyscf/lib'],
>> >>     }
>> >>
>> >>     sanity_check_commands = ["python -c 'import pyscf'"]
>> >>
>> >>     modextrapaths = {'PYTHONPATH': '', 'PYSCF_EXT_PATH': ''}
>> >>
>> >>     moduleclass = 'chem'
>> >>
>> >>
>> >>     Even if the module is created, I am having troubles by running it
>> >>     in a node different from master. In particular, when I load the
>> >>     module and ran the code, it goes all OK:
>> >>
>> >>     module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
>> >>     python
>> >>     from pyscf import gto, scf
>> >>     mol = gto.M(atom='H 0 0 0; H 0 0 1')
>> >>     mf = scf.RHF(mol).run()
>> >>
>> >>     but when I try to run it on a node different from the master, I
>> get:
>> >>
>> >>     Python 3.8.6 (default, Jun  1 2021, 16:43:49)
>> >>     [GCC 10.2.0] on linux
>> >>     Type "help", "copyright", "credits" or "license" for more
>> information.
>> >>     >>> from pyscf import gto, scf
>> >>     >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
>> >>     >>> mf = scf.RHF(mol).run()
>> >>     Illegal instruction (core dumped)
>> >>
>> >>     As far as I read in different places, it seems to be related to
>> >>     the different architectures of our master and slaves nodes.
>> >>
>> >>     If I execute
>> >>
>> >>     grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]'
>> >>     '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in
>> >>     $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" |
>> >>     "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done;
>> >>     MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
>> >>
>> >>     on the slaves I get: -march=native -mssse3 -mfma -mcx16 -msse4.1
>> >>     -msse4.2 -mpopcnt -mavx -mavx2
>> >>
>> >>     whereas on the master node we have: -march=native -mcx16
>> >>
>> >>     I tried to compile PySCF by adding these lines to my *.eb file:
>> >>
>> >>     configopts += "-DBUILD_FLAGS='-march=native -mssse3 -mfma -mcx16
>> >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
>> >>     configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3 -mfma -mcx16
>> >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
>> >>     configopts += "-DCMAKE_CXX_FLAGS='-march=native -mssse3 -mfma
>> >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
>> >>     configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native -mssse3 -mfma
>> >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"
>> >>
>> >>     but in that case the code does not run on master and neither in
>> >>     slaves.
>> >>
>> >>
>> >>     I'm sorry if it is a stupid question. I am far from being a system
>> >>     admin...
>> >>
>> >>     Thanks a lot for your help.
>> >>
>> >>     Dr. Agustín Aucar
>> >>     Institute for Modeling and Innovative Technologies - Argentina
>> >
>>
>

Reply via email to