Hi Gunnar,

The pingpong latencies are a clear indicator that there is something wrong with 
the MPI runtime. It looks to me like you are using TCP instead of InfiniBand. 
Did you verify that? I have no experience with OpenMPI, so can’t really tell 
you exactly how to check it, but it should be relatively straight forward.

Damian

From: <[email protected]> on behalf of Gunnar Sauer 
<[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday 29 January 2017 at 14:51
To: "[email protected]" <[email protected]>
Subject: Re: [easybuild] toolchain which uses existing, optimized, 
MPI/ScaLAPACK and build environment?

Hello Kenneth,
thanks for coming back to my question. I am sorry to say than I cannot follow 
the EasyBuild route anymore for the purpose of my internship, but I am 
definitely interested to solve the problems (described below) for myself and 
for my future career. (I'll have to buy access to some public cluster like 
Sabalcore or see whether I can work on our university cluster without a 
specific project. So it may take 1-2 weeks until I can proceed.)

I have run the HPCC benchmark (version 1.5.0 from 
http://icl.cs.utk.edu/hpcc/software/index.html) once with the preinstalled 
gcc/openmpi/openblas of the company's Xeon cluster, and secondly with the 
foss/2016b toolchain built previously on the same cluster. This was meant as a 
quick check whether the forum users were right, saying that it doesn't matter 
for the MPI performance whether you use and optimized OpenMPI version or the 
generic EasyBuild OpenMPI built from source - or whether our engineers were 
right, saying that you have to use the system tools including an OpenMPI that 
has been set up for the Infiniband hardware if you want any decent MPI 
performance.

When I presented the numbers below, showing ping pong latencies of 10000 us 
(EasyBuild) compared to 2 us (system tools), we had a quick discussion, and my 
task is now to write a build script independent from EasyBuild, respecting the 
existing tools. Here are the results of the ping pong test, first for the 
system tools (see also attached hpccoutf.system for the complete HPCC ouput), 
second for the foss/2016b toolchain (see also attached hpccoutf.eb):

System compiler, openmpi, openblas:

Major Benchmark results:
------------------------

Max Ping Pong Latency:                 0.002115 msecs
Randomly Ordered Ring Latency:         0.001384 msecs
Min Ping Pong Bandwidth:            2699.150014 MB/s
Naturally Ordered Ring Bandwidth:    549.443306 MB/s
Randomly  Ordered Ring Bandwidth:    508.267423 MB/s

EasyBuild foss/2016b toolchain:

Major Benchmark results:
------------------------

Max Ping Pong Latency:                10.000019 msecs
Randomly Ordered Ring Latency:         4.251704 msecs
Min Ping Pong Bandwidth:              62.532243 MB/s
Naturally Ordered Ring Bandwidth:    134.390539 MB/s
Randomly  Ordered Ring Bandwidth:    144.071750 MB/s

I'd appreciate if somebody could analyze the attached full outputs to suggest 
what I have done wrong.

These results are along the lines of previous tests of mine that showed that 
EasyBuilds toolchain is not practical to solve large linear equation systems 
that involve more than one node. As soon as inter-node communication is 
involved, performance drops from 10-100 Gflop/s to 0.1-0.01 Gflop/s and gets 
worse the more nodes are involved even if the system is scaled to fill always 
80% of the nodes' memory.

These are just my initial discouraging attempts with EasyBuild. I'll be happy 
to find out that the performance problem is due to my mistake, because I might 
not have found the relevant documentation, forgotten to set some compiler flag 
or something else.

Thank you
Gunnar

2017-01-28 17:38 GMT+01:00 Kenneth Hoste 
<[email protected]<mailto:[email protected]>>:
Hi Gunnar,
On 25/01/2017 19:08, Gunnar Sauer wrote:
Hello Jens,

2017-01-25 13:03 GMT+01:00 Jens Timmerman 
<[email protected]<mailto:[email protected]>>:
Hello Gunnar,


On 24/01/2017 19:54, Gunnar Sauer wrote:
> Hello EasyBuild experts,
>

> But which toolchain do I choose on the Xeon cluster, which provides
> all those optimized tools through already existing modules? Can I
> tweak the goolf toolchain to use the existing system modules?
Yes, you could create your own toolchain to use the already existing
modules, this is exactly how the Cray toolchain works, see
http://easybuild.readthedocs.io/en/latest/Using_external_modules.html
for more information on how to create your own toolchain from existing
compilers and libraries.

Ok, I'll try to understand the details how to set up a new toolchain and go 
this path. I have found the GCC-system, which seems to lead in the right 
direction. Would it be feasible to extend GCC-system to include OpenMPI-system 
and OpenBLAS-system in a similar fashion?

The GCC-system easyconfig file leverages the SystemCompiler easyblock.

To also support OpenMPI-system and OpenBLAS-system, a similar SystemLibrary 
easyblock should be created that forces you to specify the required information 
about the system library you would like to use.

Alan's suggestion of just grabbing the module file that is generated using 
"--module-only --force" and adjusting it as needed is a good one though, it may 
take you a long way...




And yes, these toolchains have infiniband support.

So, it would be very nice to know what optimizations are being done at
your company that make the internal toolchain even better optimized, so
all EasyBuild
users could all benefit from this knowledge and potentially millions of
CPU hours could be saved.

I will see, whether they share the details with me, or if they even have the 
details. As I understood, the cluster has been set up and is maintained by an 
external company. When we discussed today using the foss stack, I only got very 
discouraging answers: infiniband couldn't be configured correctly using a 
generic MPI installation procedure, BLAS would be an order of magnitude slower 
unless you put in the correct parameters for the specific architecture, etc.
Nevertheless, I am currently trying to set up the HPL benchmark, and I will 
compare the results with easybuild's foss toolchain and with the cluster's 
'builtin' toolchain.

I'd very interested in hearing more about this, i.e. how the benchmark results 
turned out, how the existing toolchains were configured compared to how we 
tackle things in EasyBuild, etc.

It's certainly possible that there was some heavy tuning done w.r.t. 
configuration parameters (in particular for the MPI); the downside of the 
easyconfigs we include in EasyBuild is that we need to keep them generic enough 
so that they'll work out of the box.
For OpenMPI specifically, it makes a lot of sense to tweak the corresponding 
easyconfig file with additional/different system-specific configure options.



I'm really serious here, if you can share this information, we would
love to hear it so we can incorporate, but I do understand that this
might be proprietary information.

TL;DR:
If you can share your highly optimized toolchains with us we will be
pleased to support them in EasyBuild if they can help us getting faster
software runtimes!

Also thanks for the other replies! I need to gain some more experience with 
EasyBuild before I can make use of all your suggestions.

Don't hesitate to let us know if you have any questions!



regards,

Kenneth



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to