Re: [Beowulf] Theoretical vs. Actual Performance

David Mathog Thu, 22 Feb 2018 09:53:35 -0800

On Thu, 22 Feb 2018 09:37:54 -0500 Prentice Bisbal wrote:

I found literature from AMD stating the
theoretical performance of these processors is 282 GFLOPS, and my
LINPACK performance isn't coming close to that (I get approximately~33%
of that). 


That does seem low.  Check the usual culprits:

1. CPU frequency adjust locked to lowest setting, or set to one whichadjusts and which then interacts poorly with the test software. Youknow that the rated performance will have been measured with the CPUlocked to its highest frequency.

2. something else running, especially something which forces the testprogram out of memory or file caches. I wouldn't expect this sort oftest to be IO bound to disk, but if it is, and hugepages are used,enormous performance drops may be observed when the system decides tomove those around. I wouldn't put it past AMD or Intel to run thesesorts of tests with the test system stripped down to the bones. Nonetwork, no logging, single user, etc. That is, absolutely nothing thatwould compete for CPU time. (Just checked on one of our big systems.ps -ef | wc shows 953 processes: 48 migration, 48 ksoftirqd, 49stopper, 49 watchdog, 49 kintegrityd, 49 kblockd, 49 ata_sff, 49 md, 49md_misc, 49 aio, 49 crypto, 49 kthrotld, 49 rpciod, 19 gdm (consoleprocesses, even with no display attached at the moment and nobody loggedin there), 193 events, 12 of my processes, and 107 miscellaneous OSprocesses.)


3.  ulimit settings.  /etc/security/limits.conf settings.

4. NUMA issues. Multithreaded programs have been observed whichallocate a large block of memory once, which ends up on one side of aNUMA system and then start some or all of the threads on the other.Those on the wrong side will run a variable amount slower than those onthe right side. If this is what is going on locking all threads to thesame side of the system (if it has just two sides) can speed things up abit. Assuming it isn't supposed to use all threads.

5. Different compiler/optimization. The vendor may have used a binarywhich was tweaked to the Nth degree, perhaps even using profiling fromearlier runs to optimize the final run. If you are using a benchmarknumber from AMD see if you can obtain the exact same version of the testsoftware that they used (which is maybe available), so that you caneliminate this variable. Perhaps wherever they keep that they also havea detailed description of the test system?


Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Theoretical vs. Actual Performance

Reply via email to