https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122356
--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Prathamesh Kulkarni <[email protected]>: https://gcc.gnu.org/g:304d08fea9e68c5b4806532816b58d33a15a917d commit r16-6913-g304d08fea9e68c5b4806532816b58d33a15a917d Author: Matthew Malcolmson <[email protected]> Date: Tue Jan 20 03:54:51 2026 +0000 libgomp: Ensure memory sync after performing tasks As described in PR 122356 there is a theoretical bug around not "publishing" user data written in a task when that task has been executed by a thread after entry to a barrier. Key points of the C memory model that are relevant: 1) Memory writes can be seen in a different order in different threads. 2) When one thread (A) reads a value with acquire memory ordering that another thread (B) has written with release memory ordering, then all data written in thread (B) before the write that set this value will be visible to thread (A) after that read. 3) This point requires that the read and write operate on the same value. The guarantee is one-way: It specifies that thread (A) will see the writes that thread (B) has performed before the specified write. It does not specify that thread (B) will see writes that thread (A) has performed before reading this value. Outline of the issue: 1) While there is a memory sync at entry to the barrier, user code can be ran after threads have all entered the barrier. 2) There are various points where a memory sync can occur after entry to the barrier: - One thread getting the `task_lock` mutex that another thread has released. - Last thread incrementing `bar->generation` with `MEMMODEL_RELEASE` and some other thread reading it with `MEMMODEL_ACQUIRE`. However there are code paths that can avoid these points. 3) On the code-paths that can avoid these points we could have no memory synchronisation between a write to user data that happened in a task executed after entry to the barrier, and some other thread running the implicit task after the barrier. Hence that "other thread" may read a stale value that should have been overwritten in the explicit task. There are two code-paths that I believe I've identified: 1) The last thread sees `task_count == 0` and increments the generation with `MEMMODEL_RELEASE` before continuing on to the next implicit task. If some other thread had executed a task that wrote user data I don't see any way in which an acquire-release ordering *from* the thread writing user data *to* the last thread would have been formed. 2) After all threads have entered the barrier. Some thread (A) is waiting in `do_wait`. Some other thread (B) completes a task writing user data. Thread (B) increments the generation using `gomp_team_barrier_done` (non atomically -- hence not allowing the formation of any acquire-release ordering with this write). Thread (A) reads that data with `MEMMODEL_ACQUIRE`, but since the write was not atomic that does not form an ordering. This patch makes two changes: 1) The write of `task_count == 0` in `gomp_barrier_handle_tasks` is done atomically while the read of `task_count` in `gomp_team_barrier_wait_end` is also made atomic. This addresses the first case by forming an acquire-release ordering *from* the thread executing tasks *to* the thread that will increment the generation and continue. 2) The write of `bar->generation` via `gomp_team_barrier_done` called from `gomp_barrier_handle_tasks` is done atomically. This means that it will form an acquire-release synchronisation with the existing atomic read of `bar->generation` in the main loop of `gomp_team_barrier_wait_end`. Testing done: - Bootstrap & regtest on aarch64 and x86_64. - With & without _LIBGOMP_CHECKING_. - Testsuite with & without OMP_WAIT_POLICY=passive - Cross compilation & regtest on arm. - TSAN done on this as part of all my upstream patches. libgomp/ChangeLog: PR libgomp/122356 * config/gcn/bar.c (gomp_team_barrier_wait_end): Atomically read team->task_count. (gomp_team_barrier_wait_cancel_end): Likewise. * config/gcn/bar.h (gomp_team_barrier_done): Atomically write bar->generation. * config/linux/bar.c (gomp_team_barrier_wait_end): Atomically read team->task_count. (gomp_team_barrier_wait_cancel_end): Likewise. * config/linux/bar.h (gomp_team_barrier_done): Atomically write bar->generation. * config/posix/bar.c (gomp_team_barrier_wait_end): Atomically read team->task_count. (gomp_team_barrier_wait_cancel_end): Likewise. * config/posix/bar.h (gomp_team_barrier_done): Atomically write bar->generation. * config/rtems/bar.h (gomp_team_barrier_done): Atomically write bar->generation. * task.c (gomp_barrier_handle_tasks): Atomically write team->task_count when decrementing to zero. * testsuite/libgomp.c/pr122356.c: New test. Signed-off-by: Matthew Malcomson <[email protected]>
