https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122356
--- Comment #2 from Matthew Malcomson <matmal01 at gcc dot gnu.org> --- (In reply to Matthew Malcomson from comment #1) > I am leaning towards approach (B) because it feels like the most robust > (always using the same code flow to ensure the synchronization). > > I don't think that performance would be hit much by the last thread going > into `gomp_barrier_handle_tasks` and seeing no tasks to perform when there > instead of seeing directly in `gomp_team_barrier_wait_end`. Uhh, I just actually tested this on the minimal benchmark that I had access too and does nothing with tasks (so I reasoned would be least likely to show any affect) and that I've been trying to optimize for the last few months. Turns out that going into `gomp_barrier_handle_tasks` has a noticeable (though not huge) affect, and I now would lean towards option (A).
