Re: [Mingw-w64-public] [PATCH 4/6] winpthreads: Reference _tls_used variable to ensure that GNU linker creates TLS directory

LIU Hao Tue, 21 Oct 2025 20:12:33 -0700

在 2025-10-22 03:58, Pali Rohár 写道:

On Thursday 16 October 2025 20:35:00 LIU Hao wrote:

The last one is unnecessary. Initializing a lock doesn't require an atomic
operation; only passing it to other threads does. And even when it's
necessary to use an atomic operation, `volatile` is not sufficient for
ARM64; it has to be done with `__atomic_store_n` which compiles to an STLR
instruction.


I really was not sure about this one. I was thinking about it...

I quite do not understand why the unlock and init have different
behavior. Both are setting the spin lock to unlocked state. init
function does not use any synchronization or barrier, but the unlock
function is using barrier with release semantics.


After a lock is initialized, it may be passed to other threads

  * via some shared data structure, with proper locking, or
  * directly or indirectly, as the user-defined argument to `pthread_create()`.

Either operation is guarantee that the lock is properly synchronized between these threads, and without this operation the lock is not accessible to other threads. When the creator thread itself accesses the lock, no synchronization is required.


That is, the lock can be initialized as an ordinary datum.

Now, when I was thinking more about it, it is really required for init
and unlock to use barrier when the counterpart function (the lock one)
always uses full memory barrier? (InterlockedExchangePointer uses the
full memory barrier, right?)

The lock operation should probably be an acquire barrier instead of a full barrier, but on x86 an atomic read-modify-write operation is always a full barrier.

On ARM64 the memory order applies to the write part of the atomic operation: https://gcc.godbolt.org/z/GWev5vWqh

AFAK volatile just ensure that compiler does not reorder emitted
instructions as part of some compiler optimizations. But volatile does
not ensure any synchronization or barrier at HW level. x86 has strong
ordering where I think that only store followed by load can be reordered
without explicit barrier. So my understanding is that volatile on x86
has a side effect of barrier (which does not apply for arm).


This looks mostly correct.

`volatile` means the operation has an effect that is unknown to the compiler, as if it was accessing global memory; and that's why it could be abused for synchronization. I think we had better not abuse `volatile` for this purpose.

But is not there some possibility that compiler could reorder something?

Initialization (of fields of a struct, for example) can happen in any order, or even be combined to SIMD operations. It doesn't matter.

What makes me suspicious even more, why lock and unlock functions mark
the memory where tk points as volatile, but the trylock and init
functions do not mark it as volatile? I would expect that at least
trylock and lock functions would declare variable in the same way as the
variable is passed to InterlockedExchangePointer() function as is.


Those need not be `volatile`.


--
Best regards,
LIU Hao

OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] [PATCH 4/6] winpthreads: Reference _tls_used variable to ensure that GNU linker creates TLS directory

Reply via email to