The MPI results are interesting. But I don’t understand why one runtime would 
perform better in one processor than in another. I mean, MPI is a communication 
runtime, the performance should be limited by algorithms and IO subsystem, not 
by the microarchitecture. Could it be that the pinning was wrongly set by Intel 
MPI in the EPYC processor? That would make more sense to me.

Cheers,
Damian

From: <[email protected]> on behalf of Miguel Costa 
<[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday, 19. August 2018 at 17:02
To: "[email protected]" <[email protected]>
Subject: Re: [easybuild] experiences with EasyBuild on AMD Epyc?

Well, it's Sunday and late so take this with a grain of salt but on a real 
application benchmark on epyc, and the only difference being using gomkl (with 
and without AVX2) or foss toolchain:

- without using MKL_DEBUG_CPU_TYPE (so, MKL using AVX, not AVX2)

  - the FFT dominated parts are ~1.6x faster with gomkl than with foss

  - but the linear algebra dominated parts are ~1.3x faster with foss than with 
gomkl.

- using MKL_DEBUG_CPU_TYPE=5, MKL does use AVX2, and now

  - the FFT dominated parts are ~1.9x faster with gomkl than with foss

  - the linear algebra dominated parts are ~1.4x faster with gomkl than with 
foss (so almost 2x compared to gomkl with AVX, as expected)

So Intel MKL seems to not only run fine on epyc but also to be the best 
solution (are we missing any optimizations in the foss toolchain on epyc, 
especially FFTW?)

(Intel MPI, on the other hand, does not seem to run fine on epyc. I had tried 
first gimkl instead of gomkl but while single-core performance was better than 
foss, multicore performance was much worse, and that issue disappeared with 
gomkl, using OpenMPI)

My two (sing)cents,
Miguel


On Fri, Aug 17, 2018 at 4:55 AM Mikael Öhman 
<[email protected]<mailto:[email protected]>> wrote:
We bought a single epyc node for testing, but we only reused our existing 
library (built on haswell with similar instruction set).

At the last EB user meeting I got recommended to use the undocumented 
MKL_DEBUG_CPU_TYPE with MKL in order to force it to use AVX2.
As one might expect, the binaries build for haswell couldn't run, because intel 
"helpfully" puts a block that forces a cpu check before starting whenever you 
compile with -xHost.
The VASP code, which we had only compiled with -xavx, ran fine.

All the foss code I have tested seemed to run fine. We had a PhD student 
benchmark OpenFOAM for us, and using the same binaries (as compiled for Intel 
2650v3), it ran ~ twice as fast on the 2x16 core EPYC node than on a 2x10 core 
2650v3 node.

Best regards, Mikael


On Thu, Aug 16, 2018 at 2:38 PM Kenneth Hoste 
<[email protected]<mailto:[email protected]>> wrote:
Dear EasyBuilders,

Does anyone here have experience with using EasyBuild on AMD Epyc systems?

Do the common toolchains (foss/2018a, intel/2018a or older) work out of
the box, or did you have to make some tweaks?
In particular, did the default of compiling with -xHost with the Intel
compilers work fine?

In addition: any experiences on how the performance compares to recent
Intel systems for particular applications?


regards,

Kenneth


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to