Re: [lng-odp] use of barrier in ODP programs

Ola Liljedahl Wed, 03 Sep 2014 02:00:07 -0700

If an application has modified some shared data, it can call
odp_sync_stores() before notifying other threads that they now can access
that data. But in general this should not be needed as the mechanism used
for notification normally includes necessary all barriers in both the
producer and the consumer.




On 3 September 2014 10:27, Bala Manoharan <[email protected]> wrote:

> Hi,
>
> I agree. My concern is that if implementation has to take care of the
> barrier then it has to do the sync during every dequeue operation
> irrespective if the shared buffer has been written by other cores or not.
>
> Can we optimize this by providing an API call which will be issued by the
> application before enqueue only if it has modified the shared memory.
>
> Regards,
> Bala
>
>
> On 3 September 2014 13:10, Ola Liljedahl <[email protected]> wrote:
>
>> What I am trying to say is that I believe (but please correct me if I am
>> wrong) that the barriers that are part of the buffer enqueue/dequeue
>> operations in the producer and consumer will cover all memory that is
>> reachable from the buffer in question. The consumer cannot access the
>> buffer user metadata before it has gained access to the buffer itself. And
>> a producer should not access the user metadata of a buffer after the buffer
>> has been enqueued (or freed).
>>
>> AFAIK, barriers are not specific to certain memory regions but concern
>> all load and stores (and other memory related operations) issued by a core
>> (thread). Is Cavium MIPS different?
>>
>> -- Ola
>>
>>
>>
>> On 3 September 2014 08:27, Bala Manoharan <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I agree completely for the odp_buffer_t which is returned by
>>> odp_schedule that the barrier should be maintained by the implementation.
>>> But what about the user meta data which is added as a pointer to the
>>> odp_buffer_t this data is linked with the buffer by the application. Since
>>> the user meta data is defined by the application it could be a pointer to
>>> memory or might have additional pointers which refers to some common memory.
>>>
>>> It might be difficult for the implementation to maintain barrier in that
>>> case. Hence it should be clearly defined that in case of the memory which
>>> is linked with the buffer as a pointer, the barrier should be maintained by
>>> the application as implementation has no control of that data.
>>>
>>> Regards,
>>> Bala
>>>
>>>
>>> On 3 September 2014 03:09, Ola Liljedahl <[email protected]>
>>> wrote:
>>>
>>>> In general with ODP, I think we should push barriers out of the
>>>> application and into the ODP implementation. We just need to be very
>>>> explicit with what barrier semantics are guaranteed by ODP.
>>>>
>>>> If one thread writes to a buffer or writes to data only reachable
>>>> through that buffer (e.g. user metadata for that buffer) and the buffer is
>>>> enqueued on a queue, when another thread dequeues (explicitly or through
>>>> the scheduler) that same buffer, ODP will guarantee that the producer
>>>> thread will have performed a store-release barrier (all stores preceding
>>>> the enqueue will be visible *before* the enqueue is visible) and that
>>>> the consumer thread performs a load-acquire barrier (all loads following
>>>> the dequeue will only be executed *after* the dequeue). This means
>>>> that all producer stores associated with that buffer will be observable by
>>>> loads from the consumer and no need for any explicit barrier in the
>>>> application.
>>>>
>>>> Linux-generic which uses (spin) locks for the queue implementation will
>>>> automatically perform the necessary barriers (store-release when the
>>>> producer releases the queue spin lock and load-acquire when the consumer
>>>> takes the queue spin lock). On ARM we currently use DMB for all barriers,
>>>> possibly this is the optimal design for ARMv7 but not for ARMv8.
>>>>
>>>> On platforms with HW queues, the ODP implementation probably has to
>>>> perform the barriers explicitly.
>>>>
>>>>
>>>> On 2 September 2014 17:09, Bala Manoharan <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> IMO, the synchronization can be called by the application as if the
>>>>> application does it then it can decide to call sync only when a thread
>>>>> writes to the shared buffer. whereas if implementation has to do the sync
>>>>> then it will have to call it every time before scheduler despatches the
>>>>> buffer.
>>>>>
>>>>> This synchronization is needed only when a buffer is queued between
>>>>> threads using odp_queue_enq as odp_schedule() guarantees that only one
>>>>> buffer gets processed in a core at any point of time.
>>>>>
>>>>> Regards,
>>>>> Bala
>>>>>
>>>>>
>>>>>
>>>>> On 2 September 2014 19:16, Bill Fischofer <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The ODP queue APIs are guaranteed to be muticore and thread safe, so
>>>>>> if such additional calls were needed it would be a bug against them.
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 2, 2014 at 8:32 AM, Ola Liljedahl <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> If a thread writes to a buffer or some other memory only reachable
>>>>>>> through
>>>>>>> this buffer and then enqueues the buffer on a queue, is there still
>>>>>>> a need
>>>>>>> for a barrier (e.g. odp_sync_stores()) before calling
>>>>>>> odp_queue_enq()?
>>>>>>>
>>>>>>> I assume that odp_queue_enq() includes (store-release) barrier
>>>>>>> semantics
>>>>>>> (possibly implicitly by the use of spin locks).
>>>>>>>
>>>>>>> I would think that the only way for another thread to be able to
>>>>>>> read this
>>>>>>> buffer (or associated memory) would be to dequeue the buffer (and
>>>>>>> thus
>>>>>>> include a load-acquire barrier). The buffer pointer cannot be
>>>>>>> obtained
>>>>>>> before all remote stores have been made visible. The buffer being
>>>>>>> passed
>>>>>>> from producer thread to consumer thread would thus be properly
>>>>>>> synchronized.
>>>>>>>
>>>>>>> We probably need more specific barrier and synchronization calls in
>>>>>>> ODP.
>>>>>>> ARMv8 has separate load-acquire and store-release barriers that
>>>>>>> could be
>>>>>>> useful from other places than lock implementations.
>>>>>>>
>>>>>>> -- Ola
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> lng-odp mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lng-odp mailing list
>>>>>> [email protected]
>>>>>> http://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] use of barrier in ODP programs

Reply via email to