https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122356
--- Comment #1 from Matthew Malcomson <matmal01 at gcc dot gnu.org> --- Just wanted to ask for feedback on the approaches to fixing this that I can see. Hoping for some feedback on: 1) Whether people agree that this is a problem or not. (as I mentioned I haven't managed to actually trigger a bug based on it, so still thinking there's a possibility that I'm misreading the code). 2) If so, which approach would be best to fix it. I believe there are two instances of a thread that has just ran a task "publishing" data that need to be addressed: 1. If that thread then calls `gomp_team_barrier_done` after all tasks are finished where this updates the generation and another thread was in `do_wait` inside `gomp_team_barrier_wait_end`. 2. If that thread has decremented `task_count` to zero, which triggers the "last" thread to never enter `gomp_barrier_handle_tasks` and wake all other threads. Instance 1. above should be relatively simple to fix by adjusting `gomp_team_barrier_done` to store with `MEMMODEL_RELEASE`. Instance 2. requires a choice that I'd appreciate feedback on. I can imagine two approaches: A) Ensure that the `task_count` decrement in `gomp_barrier_handle_tasks` is atomic with `MEMMODEL_RELEASE` and update the `task_count` read in the barrier to be atomic with `MEMMODEL_ACQUIRE`. B) Ensure that the "last" thread always goes through `gomp_barrier_handle_tasks` (synchronizing on the same point as instance 1 above). I am leaning towards approach (B) because it feels like the most robust (always using the same code flow to ensure the synchronization). I don't think that performance would be hit much by the last thread going into `gomp_barrier_handle_tasks` and seeing no tasks to perform when there instead of seeing directly in `gomp_team_barrier_wait_end`. (This guess being that if there are no tasks remaining then the mutex won't be held for much longer if at all) -- but obviously would run performance measurements to check if I went with this.
