--- Comment #24 from Alexander Nesterovskiy <alexander.nesterovskiy at intel
dot com> ---
Yes, it looks like more time is being spent in synchronizing.
r256990 really changes the way autopar works:
For r253679...r256989 the most of work was in main thread0 mostly (thread0
~91%, threads1-3 ~3% each one).
For r256990 there is the same distribution as for r253678 (thread0 ~34%,
threads1-3 ~22% each one) but a lot of time is being spent spinning.
I've attached a chart comparing r253678 and r256990 in the same time scale
libgomp.so.1.0.0 code executed in thread1 for both cases is wait functions, and
for r256990 they are called more often.
Setting OMP_WAIT_POLICY doesn't change a lot:
for ACTIVE - performance is nearly the same as default
for PASSIVE - there is a serious performance drop for r256990 (looks reasonable
because of a lots of threads sleeps/wake-ups)
Changing parloops-schedule also have no positive effect:
r253678 performance is mostly the same for static, guided and dynamic
r256990 performance is best with static, which is default