Some additional information from our debugging. Many of the tests seem to time 
out, so we're investigating MPI problems. 

Because the CI/CD tests at https://dashboard.cp2k.org were mostly using MPICH 
we built a new gmpolf-2019 toolchain and tried with that. 

And indeed, that gives much better results:

Summary of the regression tester run from 2019-09-18_18-46-51 using 
Linux-x86-64-gmpolf popt 
Number of FAILED  tests 38
Number of WRONG   tests 3
Number of CORRECT tests 3007
Number of NEW     tests 19
Total number of   tests 3067
--------------------------------------------------------------------------
Number of LEAKING tests 0
Number of memory  leaks 0
--------------------------------------------------------------------------

The only abort reason left is the following:

     38  *  \___/                KS energy is an abnormal value (NaN/Inf).      
       *

Which could be related to the Skylake CPUs, c.f. 
https://github.com/xianyi/OpenBLAS/issues/2029

Although the OpenBLAS we use should already have the patch for that (PR #8227). 
Any feedback or recommendations?

Best Greetings
André

----- Am 17. Sep 2019 um 18:49 schrieb Andre Gemuend 
[email protected]:

> Dear EasyBuilders,
> 
> we are currently trying to use the CP2k config that is shipped with the
> easyconfigs, more specifically CP2K-6.1-foss-2019a.eb. Unfortunately, we are
> seeing a lot of runtime issues with this version. Also the CP2K regression 
> test
> suite is not very happy. This is the summary we get:
> 
> Summary of the regression tester run from 2019-09-11_13-29-39 using
> Linux-x86-64-foss popt
> Number of FAILED  tests 288
> Number of WRONG   tests 559
> Number of CORRECT tests 2203
> Number of NEW     tests 17
> Total number of   tests 3067
> --------------------------------------------------------------------------
> Number of LEAKING tests 0
> Number of memory  leaks 0
> --------------------------------------------------------------------------
> 
> When looking at the error_summary, we see mostly "SCF not converged" (55 
> cases)
> and "tr(Ap_j*p_j) < 0" (51 cases).
> 
> I'm curious if other users see the same or if it has something to do with our
> environment?
> 
> We are on CentOS 7.6 and have Xeon Gold (Skylake EP) on these compute nodes.
> 
> We would be happy for any help or suggestions.
> 
> Best Greetings
> --
> Dipl.-Inf. André Gemünd, Leiter IT-S
> Fraunhofer-Institute for Algorithms and Scientific Computing
> [email protected]
> Tel: +49 2241 14-2193
> /C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend

-- 
Dipl.-Inf. André Gemünd, Leiter IT-S
Fraunhofer-Institute for Algorithms and Scientific Computing
[email protected]
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend

Reply via email to