Re: Another memory order opaque question

Peter Veentjer Mon, 17 Feb 2025 00:14:53 -0800

And due to OOoE of modern microarchitectures, with acquire/release, the
load can even be executed before the store is executed since there is no
raw dependency.




On Mon, Feb 17, 2025 at 10:10 AM Peter Veentjer <[email protected]>
wrote:

> For Hotspot and X86 a release/acquire vs volatile can make a difference.
>
> Imagine you would have a:
>
> A=10
> r1=B
>
> So we have a store to A and a load of B.
>
> On the X86 every store is a release store and every load is an acquire
> load.
>
> On the X86, a store can be reordered with a newer load due to store
> buffers. So A=10 and r1=B could be reordered.
>
> If A and B would be volatile, then this reordering isn't allowed. The
> reason this isn't allowed is that a program without data races can only
> have sequential consistent execution. And for an execution to be sequential
> consistent, it needs to have the same effect as if all the threads ran
> their operations in the order of the program (so no reordering).
>
> To prevent the store and load to be reordered, a [StoreLoad] barrier needs
> to be inserted  (e.g. in the form of an MFENCE or an LOCK prefixed
> instruction)
>
> A=10
> [StoreLoad]
> r1=B
>
> This [StoreLoad] effectively stalls the execution of the load (r1=B) till
> the store (A=1) in the store buffer has been drained to the coherent cache
> and this could take some time. There could be many queued stores in the
> store buffer all waiting for the cache line to be returned in the right
> state.
>
> Without [StoreLoad] the load can be performed if the store is still in the
> store buffer.
>
>
>
>
>
> On Mon, Feb 17, 2025 at 9:34 AM Daniel Marques <[email protected]>
> wrote:
>
>> Thanks for the response.  I hope you don't mind a few follow ups:
>>
>> Is there a "for dummies" which describes the difference between
>> Release/Acquire vs Volatile?  For Hotspot and x86-64, are there actual
>> differences in implementation, and measurable performance using
>> Release/Acquire vs Volatile?
>>
>> Again, thanks in advance.
>>
>> Dan
>>
>>
>> On Wed, Feb 12, 2025 at 2:05 PM Peter Veentjer <[email protected]>
>> wrote:
>>
>>> Yes, it is the same.
>>>
>>> You could even go for:
>>>
>>> class ExampleTwo {
>>>
>>>      void threadOne() {
>>>           dataBuffer.putInt(valueOffset, 100)
>>>           Unsafe.putIntRelease(null,  dataBufferAddr + readyOffset, 1)
>>>      }
>>>
>>>      void threadTwo() {
>>>           while ( Unsafe.getIntAcquire(null,  dataBufferAddr  +
>>> readyOffset)  == 0)
>>>                 ;
>>>           assert  dataBuffer.getInt(valueOffset)  == 100;
>>>      }
>>> }
>>>
>>>
>>> On Wed, Feb 12, 2025 at 5:55 PM Daniel Marques <[email protected]>
>>> wrote:
>>>
>>>> I'm very new to both offheap allocations and the JMM, etc., so forgive
>>>> the perhaps naive question.
>>>>
>>>> The introduction material to the JMM typically presents the following
>>>> example of a correct program, assuming the two methods are executed
>>>> concurrently in different threads.
>>>>
>>>> class ExampleOne {
>>>>      volatile int ready;
>>>>      int value;
>>>>
>>>>      void threadOne() {
>>>>           value = 100;
>>>>           ready = 1;
>>>>      }
>>>>
>>>>      void threadTwo() {
>>>>           while (ready == 0)
>>>>                 ;
>>>>           assert value == 100;
>>>>      }
>>>> }
>>>>
>>>> Is the following semantically equivalent, but now the two methods could
>>>> be run in different processes, or are there any additional operations
>>>> necessary to 'coordinate' between two processes sharing memory (assuming
>>>> jdk >= 9)?
>>>>
>>>> class ExampleTwo {
>>>>       MappedByteBuffer dataBuffer;
>>>>       long   dataBufferAddr;
>>>>       int valueOffset = 0;
>>>>       int readyOffset = 4;
>>>>
>>>>       ExampleTwo() {
>>>>             File file = new File("foo.dat");
>>>>             FileChannel fc = new ...
>>>>             dataBuffer = fc.map(READ_WRITE, 0, 2 * Integer.BYTES)
>>>>             dataBufferAddr = Unsafe.magic(databuffer) // I'm actually
>>>> using Agrona's UnsafeBuffer to do all the magic for me
>>>>       }
>>>>
>>>>      void threadOne() {
>>>>           dataBuffer.putInt(valueOffset, 100)
>>>>           Unsafe.putIntVolatile(null,  dataBufferAddr + readyOffset, 1)
>>>>      }
>>>>
>>>>      void threadTwo() {
>>>>           while ( Unsafe.getIntVolatile(null,  dataBufferAddr  +
>>>> readyOffset)  == 0)
>>>>                 ;
>>>>           assert  dataBuffer.getInt(valueOffset)  == 100;
>>>>      }
>>>> }
>>>>
>>>> Thanks in advance,
>>>>
>>>> Dan
>>>>
>>>> On Tue, Feb 11, 2025 at 6:34 AM Peter Veentjer <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks a lot for your answer and for the confirmation that my
>>>>> understanding is correct.
>>>>>
>>>>> On Wed, Feb 5, 2025 at 12:30 PM Aleksey Shipilev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> On 2/3/25 12:06, Peter Veentjer wrote:
>>>>>> > Imagine the following code:
>>>>>> >
>>>>>> > ... lot of writes writes to the buffer
>>>>>> > buffer.putInt(a_offset,a_value)  (1)
>>>>>> > buffer.putRelease(b_offset,b_value) (2)
>>>>>> > releaseFence() (3)
>>>>>> > buffer.putInt(c_offset,c_value) (4)
>>>>>> >
>>>>>> > Buffer is a chunk of memory that is shared with another process and
>>>>>> the writes need to be seen in
>>>>>> > order. So when 'b' is seen, 'a' should be seen. And when 'c' is
>>>>>> seen, 'b' should be seen. There is
>>>>>> > no other synchronization.
>>>>>> >
>>>>>> > All offsets are guaranteed to be naturally aligned. All the putInts
>>>>>> are plain puts (using Unsafe).
>>>>>> >
>>>>>> > The putRelease (2) will ensure that 'a' is seen before 'b', and it
>>>>>> will ensure atomicity and
>>>>>> > visibility of 'b' (so the appropriate compiler and memory fences
>>>>>> where needed).
>>>>>> >
>>>>>> > The releaseFence (3) will ensure that b is seen before c.
>>>>>>
>>>>>> Looks to me this fence can be replaced with releasing store of "c":
>>>>>>
>>>>>>   buffer.putInt(a_offset,a_value)
>>>>>>   buffer.putRelease(b_offset,b_value)
>>>>>>   buffer.putRelease(c_offset,c_value)
>>>>>>
>>>>>> My preference is almost always to avoid the explicit fences if you
>>>>>> can control the memory ordering
>>>>>> of the actual accesses. Using putRelease instead of explicit fence
>>>>>> also forces you think about the
>>>>>> symmetries: should all loads of "c" be performed with getAcquire to
>>>>>> match the putRelease?
>>>>>>
>>>>>> > My question is about (4). Since it is a plain store, the compiler
>>>>>> can do a ton of trickery including
>>>>>> > the delay of visibility of (4). Is my understanding correct and is
>>>>>> there anything else that could go
>>>>>> > wrong?
>>>>>>
>>>>>> The common wisdom is indeed "let's put non-plain memory access mode,
>>>>>> so the access is hopefully more
>>>>>> prompt", but I have not seen any of these effects thoroughly
>>>>>> quantified beyond "let's forbid the
>>>>>> compiler to yank our access out of the loop". Maybe I have not looked
>>>>>> hard enough.
>>>>>>
>>>>>> I suspect the delays introduced by compiler moving code around in
>>>>>> sequential code streams is on the
>>>>>> scale where it does not matter all that much for end-to-end latency.
>>>>>> The only (?) place where code
>>>>>> movement impact could be multiplied to a macro-effect is when the
>>>>>> memory ops shift in/out/around the
>>>>>> loops. I would not be overly concerned about latency impact of
>>>>>> reordering within the short straight
>>>>>> code stream.
>>>>>>
>>>>>> You can try to measure it with producer-consumer / ping-pong style
>>>>>> benchmarks: put more memory ops
>>>>>> around (4), turn on instruction scheduler randomizers (-XX:+StressLCM
>>>>>> should be useful here, maybe
>>>>>> -XX:+StressGCM), see if there is an impact. I suspect the effect is
>>>>>> too fine-grained to be
>>>>>> accurately measured with direct timing measurements, so you'll need
>>>>>> to get creative how to measure
>>>>>> "promptness".
>>>>>>
>>>>>> > What would be the lowest memory access mode that would resolve this
>>>>>> problem? My guess is that the
>>>>>> > last putInt, should be a putIntOpaque.
>>>>>>
>>>>>> Yes, in current Hotspot, opaque would effectively pin the access in
>>>>>> place, so it would be exposed to
>>>>>> hardware in the order closer to original source code order. Then it
>>>>>> is up to hardware to see when to
>>>>>> perform the store. But as I said above, I'll be surprised if it
>>>>>> actually matters.
>>>>>>
>>>>>> Thanks,
>>>>>> -Aleksey
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "mechanical-sympathy" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion, visit
>>>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mechanical-sympathy" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion, visit
>>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion, visit
>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion, visit
>> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAV8yNjHma2to4So9%2BdMmksbhwLqkdWniU-wQUx1XPfvQ%40mail.gmail.com.

Re: Another memory order opaque question

Reply via email to