Am 28.02.2016 um 16:27 schrieb Jon Tegner: > For one of our applications (CFD/OpenFOAM) we have noticed that the > calculation runs faster using 12 cores on 4 nodes compared to when using > 24 cores on 4 nodes.
If this is OpenMP within each node... This may be an effect of first touch going wrong with lots of remote memory accesses, especially with huge pages. Checking with numatop may be useful (how much local vs. remote accesses), see https://github.com/01org/numatop Josef > > In our environment we also have older AMD hardware (nodes with 4 CPUs > with 12 cores each), and here we don't see these strange scaling issues. > > System is CentOS-7, and communication is over FDR Infiniband. BIOS is > recently updated, and hyperthreading is disabled. > > Feel a bit lost here, and any hints on how to proceed with this are > greatly appreciated! > > Thanks, > > /jon > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
