[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

alexander.nesterovskiy at intel dot com Fri, 02 Feb 2018 09:34:18 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604


--- Comment #24 from Alexander Nesterovskiy <alexander.nesterovskiy at intel 
dot com> ---
Yes, it looks like more time is being spent in synchronizing.
r256990 really changes the way autopar works:
For r253679...r256989 the most of work was in main thread0 mostly (thread0
~91%, threads1-3 ~3% each one).
For r256990 there is the same distribution as for r253678 (thread0 ~34%,
threads1-3 ~22% each one) but a lot of time is being spent spinning.
I've attached a chart comparing r253678 and r256990 in the same time scale
(~0.5 sec).
libgomp.so.1.0.0 code executed in thread1 for both cases is wait functions, and
for r256990 they are called more often.

Setting OMP_WAIT_POLICY doesn't change a lot:
for ACTIVE - performance is nearly the same as default
for PASSIVE - there is a serious performance drop for r256990 (looks reasonable
because of a lots of threads sleeps/wake-ups)

Changing parloops-schedule also have no positive effect:
r253678 performance is mostly the same for static, guided and dynamic
r256990 performance is best with static, which is default

[Bug tree-optimization/82604] [8 Regression] SPEC CPU2006 410.bwaves ~50% performance regression with trunk@253679 when ftree-parallelize-loops is used

Reply via email to