Jeff,

<rant-mode on>

Jeff Squyres wrote:
ignored it whenever presenting competitive data. The 1,000,000th time I saw this, I gave up arguing that our competitors were not being fair and simply changed our defaults to always leave memory pinned for OpenFabrics-based networks.

Instead, you should have told them that caching memory registration is unsafe and ask them why they don't care if their customers don't get the right answer. And then you would follow up by asking if they actually have a way to check that there is no data corruption. It's not really FUD, it's tit for tat :-)

2. Even if you tag someone in public for not being fair, they always say the same thing, "Oh sorry, my mistake" (regardless of whether they actually forgot or did it intentionally). I told several competitors *many times* that they had to use leave_pinned, but in all public comparison numbers, they never did. Hence, they always looked better.

Looked better on what, micro-benchmarks ? The same micro-benchmarks that have already been manipulated to death, like OSU using a stream-based bandwidth test to hide the start-up overhead ? If the option improves real applications at large, then it should be on by default and there is no debate (users should never have to know about knobs). If it is only for micro-benchmarks, stand your ground and do the right thing. It does not do the community any good if MPI implementations are tuned for a broken micro-benchmarks penis contest. If you want to play that game, at least make your own micro-benchmarks.

Believe me, I know what it is to hear technical atrocities from these marketing idiots. There is nothing you can do, they are payed to talk and you are not. In the end, HPC gets what HPC deserves, people should do their homework.

For applications at large, performance gains due to core-binding is suspect. Memory-binding may have more spine, but the OS should already be able do a good job with NUMA allocation and page migration.

- The Linux scheduler does no/cannot optimize well for many HPC apps; binding definitely helps in many scenarios (not just benchmarks).

Then fix the Linux scheduler. Only the OS scheduler can do a meaningful resource allocation, because it sees everything and you don't.

<rant-mode off>

Patrick

Reply via email to