The MPI results are interesting. But I don’t understand why one runtime would perform better in one processor than in another. I mean, MPI is a communication runtime, the performance should be limited by algorithms and IO subsystem, not by the microarchitecture. Could it be that the pinning was wrongly set by Intel MPI in the EPYC processor? That would make more sense to me.
Cheers, Damian From: <[email protected]> on behalf of Miguel Costa <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Sunday, 19. August 2018 at 17:02 To: "[email protected]" <[email protected]> Subject: Re: [easybuild] experiences with EasyBuild on AMD Epyc? Well, it's Sunday and late so take this with a grain of salt but on a real application benchmark on epyc, and the only difference being using gomkl (with and without AVX2) or foss toolchain: - without using MKL_DEBUG_CPU_TYPE (so, MKL using AVX, not AVX2) - the FFT dominated parts are ~1.6x faster with gomkl than with foss - but the linear algebra dominated parts are ~1.3x faster with foss than with gomkl. - using MKL_DEBUG_CPU_TYPE=5, MKL does use AVX2, and now - the FFT dominated parts are ~1.9x faster with gomkl than with foss - the linear algebra dominated parts are ~1.4x faster with gomkl than with foss (so almost 2x compared to gomkl with AVX, as expected) So Intel MKL seems to not only run fine on epyc but also to be the best solution (are we missing any optimizations in the foss toolchain on epyc, especially FFTW?) (Intel MPI, on the other hand, does not seem to run fine on epyc. I had tried first gimkl instead of gomkl but while single-core performance was better than foss, multicore performance was much worse, and that issue disappeared with gomkl, using OpenMPI) My two (sing)cents, Miguel On Fri, Aug 17, 2018 at 4:55 AM Mikael Öhman <[email protected]<mailto:[email protected]>> wrote: We bought a single epyc node for testing, but we only reused our existing library (built on haswell with similar instruction set). At the last EB user meeting I got recommended to use the undocumented MKL_DEBUG_CPU_TYPE with MKL in order to force it to use AVX2. As one might expect, the binaries build for haswell couldn't run, because intel "helpfully" puts a block that forces a cpu check before starting whenever you compile with -xHost. The VASP code, which we had only compiled with -xavx, ran fine. All the foss code I have tested seemed to run fine. We had a PhD student benchmark OpenFOAM for us, and using the same binaries (as compiled for Intel 2650v3), it ran ~ twice as fast on the 2x16 core EPYC node than on a 2x10 core 2650v3 node. Best regards, Mikael On Thu, Aug 16, 2018 at 2:38 PM Kenneth Hoste <[email protected]<mailto:[email protected]>> wrote: Dear EasyBuilders, Does anyone here have experience with using EasyBuild on AMD Epyc systems? Do the common toolchains (foss/2018a, intel/2018a or older) work out of the box, or did you have to make some tweaks? In particular, did the default of compiling with -xHost with the Intel compilers work fine? In addition: any experiences on how the performance compares to recent Intel systems for particular applications? regards, Kenneth ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------

