On Thu, 22 Feb 2018 09:37:54 -0500 Prentice Bisbal wrote:

I found literature from AMD stating the
theoretical performance of these processors is 282 GFLOPS, and my
LINPACK performance isn't coming close to that (I get approximately ~33%
of that). 

That does seem low.  Check the usual culprits:

1. CPU frequency adjust locked to lowest setting, or set to one which adjusts and which then interacts poorly with the test software. You know that the rated performance will have been measured with the CPU locked to its highest frequency.

2. something else running, especially something which forces the test program out of memory or file caches. I wouldn't expect this sort of test to be IO bound to disk, but if it is, and hugepages are used, enormous performance drops may be observed when the system decides to move those around. I wouldn't put it past AMD or Intel to run these sorts of tests with the test system stripped down to the bones. No network, no logging, single user, etc. That is, absolutely nothing that would compete for CPU time. (Just checked on one of our big systems. ps -ef | wc shows 953 processes: 48 migration, 48 ksoftirqd, 49 stopper, 49 watchdog, 49 kintegrityd, 49 kblockd, 49 ata_sff, 49 md, 49 md_misc, 49 aio, 49 crypto, 49 kthrotld, 49 rpciod, 19 gdm (console processes, even with no display attached at the moment and nobody logged in there), 193 events, 12 of my processes, and 107 miscellaneous OS processes.)

3.  ulimit settings.  /etc/security/limits.conf settings.

4. NUMA issues. Multithreaded programs have been observed which allocate a large block of memory once, which ends up on one side of a NUMA system and then start some or all of the threads on the other. Those on the wrong side will run a variable amount slower than those on the right side. If this is what is going on locking all threads to the same side of the system (if it has just two sides) can speed things up a bit. Assuming it isn't supposed to use all threads.

5. Different compiler/optimization. The vendor may have used a binary which was tweaked to the Nth degree, perhaps even using profiling from earlier runs to optimize the final run. If you are using a benchmark number from AMD see if you can obtain the exact same version of the test software that they used (which is maybe available), so that you can eliminate this variable. Perhaps wherever they keep that they also have a detailed description of the test system?


David Mathog
Manager, Sequence Analysis Facility, Biology Division, Caltech
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 

Reply via email to