On 10/18/2016 06:58 PM, Alexander Monakov wrote:

The currently published OpenMP version of LULESH simply doesn't use openmp-simd
anywhere. This should make it obvious that it won't be anywhere near any
reasonable CUDA implementation, and also bound to be below host performance.
Besides, it's common for such benchmark suites to have very different levels of
hand tuning for the native-CUDA implementation vs OpenMP implementation,
sometimes to the point of significant algorithmic differences. So you're
making an invalid comparison here.

The information I have is that the LULESH code is representative of how at least some groups on the HPC side expect to write OpenMP code. It's the biggest real-world piece of code that I'm aware of that's available for testing, so it seemed like a good thing to try. If you have other real-world tests available, please let us know. If you can demonstrate good performance by modifying LULESH sources, that would also be a good step, although maybe not the ideal case. But I think it's not unreasonable to look for a demonstration that reasonable performance is achievable on something that isn't just a microbenchmark.

I'll refrain from any further comments on the topic. The ptx patches don't look unreasonable iff someone else decides that this version of OpenMP support should be merged and I'll look into them in more detail if that happens. Patch 2/8 is ok now.


Bernd

Reply via email to