Re: [lng-odp] use of barrier in ODP programs

Savolainen, Petri (NSN - FI/Espoo) Wed, 03 Sep 2014 04:08:10 -0700

Hi,

This is correct. If application use ODP constructs to synchronize (queues, 
locks, barriers, …) there’s not need make odp_sync_stores() calls, no matter 
which part of coherent memory was modified. If application uses some other 
method to synchronize (global variable, file, etc), then explicit sync is 
needed.

-Petri

From: [email protected] 
[mailto:[email protected]] On Behalf Of ext Ola Liljedahl
Sent: Wednesday, September 03, 2014 11:59 AM
To: Bala Manoharan
Cc: [email protected]
Subject: Re: [lng-odp] use of barrier in ODP programs

If an application has modified some shared data, it can call odp_sync_stores() 
before notifying other threads that they now can access that data. But in 
general this should not be needed as the mechanism used for notification 
normally includes necessary all barriers in both the producer and the consumer.

On 3 September 2014 10:27, Bala Manoharan 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I agree. My concern is that if implementation has to take care of the barrier 
then it has to do the sync during every dequeue operation irrespective if the 
shared buffer has been written by other cores or not.
Can we optimize this by providing an API call which will be issued by the 
application before enqueue only if it has modified the shared memory.
Regards,
Bala

On 3 September 2014 13:10, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
What I am trying to say is that I believe (but please correct me if I am wrong) 
that the barriers that are part of the buffer enqueue/dequeue operations in the 
producer and consumer will cover all memory that is reachable from the buffer 
in question. The consumer cannot access the buffer user metadata before it has 
gained access to the buffer itself. And a producer should not access the user 
metadata of a buffer after the buffer has been enqueued (or freed).

AFAIK, barriers are not specific to certain memory regions but concern all load 
and stores (and other memory related operations) issued by a core (thread). Is 
Cavium MIPS different?

-- Ola

On 3 September 2014 08:27, Bala Manoharan 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I agree completely for the odp_buffer_t which is returned by odp_schedule that 
the barrier should be maintained by the implementation. But what about the user 
meta data which is added as a pointer to the odp_buffer_t this data is linked 
with the buffer by the application. Since the user meta data is defined by the 
application it could be a pointer to memory or might have additional pointers 
which refers to some common memory.
It might be difficult for the implementation to maintain barrier in that case. 
Hence it should be clearly defined that in case of the memory which is linked 
with the buffer as a pointer, the barrier should be maintained by the 
application as implementation has no control of that data.

Regards,
Bala

On 3 September 2014 03:09, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
In general with ODP, I think we should push barriers out of the application and 
into the ODP implementation. We just need to be very explicit with what barrier 
semantics are guaranteed by ODP.

If one thread writes to a buffer or writes to data only reachable through that 
buffer (e.g. user metadata for that buffer) and the buffer is enqueued on a 
queue, when another thread dequeues (explicitly or through the scheduler) that 
same buffer, ODP will guarantee that the producer thread will have performed a 
store-release barrier (all stores preceding the enqueue will be visible before 
the enqueue is visible) and that the consumer thread performs a load-acquire 
barrier (all loads following the dequeue will only be executed after the 
dequeue). This means that all producer stores associated with that buffer will 
be observable by loads from the consumer and no need for any explicit barrier 
in the application.

Linux-generic which uses (spin) locks for the queue implementation will 
automatically perform the necessary barriers (store-release when the producer 
releases the queue spin lock and load-acquire when the consumer takes the queue 
spin lock). On ARM we currently use DMB for all barriers, possibly this is the 
optimal design for ARMv7 but not for ARMv8.

On platforms with HW queues, the ODP implementation probably has to perform the 
barriers explicitly.

On 2 September 2014 17:09, Bala Manoharan 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

IMO, the synchronization can be called by the application as if the application 
does it then it can decide to call sync only when a thread writes to the shared 
buffer. whereas if implementation has to do the sync then it will have to call 
it every time before scheduler despatches the buffer.
This synchronization is needed only when a buffer is queued between threads 
using odp_queue_enq as odp_schedule() guarantees that only one buffer gets 
processed in a core at any point of time.

Regards,
Bala

On 2 September 2014 19:16, Bill Fischofer 
<[email protected]<mailto:[email protected]>> wrote:
The ODP queue APIs are guaranteed to be muticore and thread safe, so if such 
additional calls were needed it would be a bug against them.

On Tue, Sep 2, 2014 at 8:32 AM, Ola Liljedahl 
<[email protected]<mailto:[email protected]>> wrote:
If a thread writes to a buffer or some other memory only reachable through
this buffer and then enqueues the buffer on a queue, is there still a need
for a barrier (e.g. odp_sync_stores()) before calling odp_queue_enq()?

I assume that odp_queue_enq() includes (store-release) barrier semantics
(possibly implicitly by the use of spin locks).

I would think that the only way for another thread to be able to read this
buffer (or associated memory) would be to dequeue the buffer (and thus
include a load-acquire barrier). The buffer pointer cannot be obtained
before all remote stores have been made visible. The buffer being passed
from producer thread to consumer thread would thus be properly synchronized.

We probably need more specific barrier and synchronization calls in ODP.
ARMv8 has separate load-acquire and store-release barriers that could be
useful from other places than lock implementations.

-- Ola

_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
[email protected]<mailto:[email protected]>
http://lists.linaro.org/mailman/listinfo/lng-odp

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] use of barrier in ODP programs

Reply via email to