On 10/17/2016 07:06 PM, Alexander Monakov wrote:

I've just pushed two commits to the branch to fix this issue.  Before those, the
last commit left the branch in a state where an incremental build seemed ok
(because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
build was broken like you've shown.  LULESH is known to work.  I also intend to
perform a trunk merge soon.


Ok that did work, however...

I think before merging this work we'll need to have some idea of how well it
works on real-world code.

This patchset and the branch lay the foundation, there's more work to be
done, in particular on the performance improvements side. There should be
an agreement on these fundamental bits first, before moving on to fine-tuning.

The performance I saw was lower by a factor of 80 or so compared to their CUDA version, and even lower than OpenMP on the host. Does this match what you are seeing? Do you have a clear plan how this can be improved?

To me this kind of performance doesn't look like something that will be fixed by fine-tuning; it leaves me undecided whether the chosen approach (what you call the fundamentals) is viable at all. Performance is still better than the OpenACC version of the benchmark, but then I think we shouldn't repeat the mistakes we made with OpenACC and avoid merging something until we're sure it's ready and of benefit to users.


Bernd

Reply via email to