Some additional information from our debugging. Many of the tests seem to time out, so we're investigating MPI problems.
Because the CI/CD tests at https://dashboard.cp2k.org were mostly using MPICH we built a new gmpolf-2019 toolchain and tried with that. And indeed, that gives much better results: Summary of the regression tester run from 2019-09-18_18-46-51 using Linux-x86-64-gmpolf popt Number of FAILED tests 38 Number of WRONG tests 3 Number of CORRECT tests 3007 Number of NEW tests 19 Total number of tests 3067 -------------------------------------------------------------------------- Number of LEAKING tests 0 Number of memory leaks 0 -------------------------------------------------------------------------- The only abort reason left is the following: 38 * \___/ KS energy is an abnormal value (NaN/Inf). * Which could be related to the Skylake CPUs, c.f. https://github.com/xianyi/OpenBLAS/issues/2029 Although the OpenBLAS we use should already have the patch for that (PR #8227). Any feedback or recommendations? Best Greetings André ----- Am 17. Sep 2019 um 18:49 schrieb Andre Gemuend [email protected]: > Dear EasyBuilders, > > we are currently trying to use the CP2k config that is shipped with the > easyconfigs, more specifically CP2K-6.1-foss-2019a.eb. Unfortunately, we are > seeing a lot of runtime issues with this version. Also the CP2K regression > test > suite is not very happy. This is the summary we get: > > Summary of the regression tester run from 2019-09-11_13-29-39 using > Linux-x86-64-foss popt > Number of FAILED tests 288 > Number of WRONG tests 559 > Number of CORRECT tests 2203 > Number of NEW tests 17 > Total number of tests 3067 > -------------------------------------------------------------------------- > Number of LEAKING tests 0 > Number of memory leaks 0 > -------------------------------------------------------------------------- > > When looking at the error_summary, we see mostly "SCF not converged" (55 > cases) > and "tr(Ap_j*p_j) < 0" (51 cases). > > I'm curious if other users see the same or if it has something to do with our > environment? > > We are on CentOS 7.6 and have Xeon Gold (Skylake EP) on these compute nodes. > > We would be happy for any help or suggestions. > > Best Greetings > -- > Dipl.-Inf. André Gemünd, Leiter IT-S > Fraunhofer-Institute for Algorithms and Scientific Computing > [email protected] > Tel: +49 2241 14-2193 > /C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend -- Dipl.-Inf. André Gemünd, Leiter IT-S Fraunhofer-Institute for Algorithms and Scientific Computing [email protected] Tel: +49 2241 14-2193 /C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend

