On 10/17/2016 07:06 PM, Alexander Monakov wrote:
I've just pushed two commits to the branch to fix this issue. Before those, the
last commit left the branch in a state where an incremental build seemed ok
(because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
build was broken like you've shown. LULESH is known to work. I also intend to
perform a trunk merge soon.
Ok that did work, however...
I think before merging this work we'll need to have some idea of how well it
works on real-world code.
This patchset and the branch lay the foundation, there's more work to be
done, in particular on the performance improvements side. There should be
an agreement on these fundamental bits first, before moving on to fine-tuning.
The performance I saw was lower by a factor of 80 or so compared to
their CUDA version, and even lower than OpenMP on the host. Does this
match what you are seeing? Do you have a clear plan how this can be
To me this kind of performance doesn't look like something that will be
fixed by fine-tuning; it leaves me undecided whether the chosen approach
(what you call the fundamentals) is viable at all. Performance is still
better than the OpenACC version of the benchmark, but then I think we
shouldn't repeat the mistakes we made with OpenACC and avoid merging
something until we're sure it's ready and of benefit to users.