Found out what was wrong...
Turned out the hardware was delivered with 15 of the 16 memory slots
populated!!!!
No wonder we had performance issues!
Anyway, thanks a a lot, all who answered!
/jon
On 02/29/2016 06:48 PM, Josef Weidendorfer wrote:
Am 28.02.2016 um 16:27 schrieb Jon Tegner:
For one of our applications (CFD/OpenFOAM) we have noticed that the
calculation runs faster using 12 cores on 4 nodes compared to when using
24 cores on 4 nodes.
If this is OpenMP within each node...
This may be an effect of first touch going wrong with lots of remote
memory accesses, especially with huge pages.
Checking with numatop may be useful (how much local vs. remote accesses),
see https://github.com/01org/numatop
Josef
In our environment we also have older AMD hardware (nodes with 4 CPUs
with 12 cores each), and here we don't see these strange scaling issues.
System is CentOS-7, and communication is over FDR Infiniband. BIOS is
recently updated, and hyperthreading is disabled.
Feel a bit lost here, and any hints on how to proceed with this are
greatly appreciated!
Thanks,
/jon
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf