Merged,

Maxim.

On 12/14/16 10:58, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> 
> 
>> -----Original Message-----
>> From: Maxim Uvarov [mailto:[email protected]]
>> Sent: Tuesday, December 13, 2016 4:06 PM
>> To: Savolainen, Petri (Nokia - FI/Espoo) <petri.savolainen@nokia-bell-
>> labs.com>; [email protected]
>> Subject: Re: [lng-odp] [API-NEXT PATCH] linux-gen: schedule_sp: use ring
>> as priority queue
>>
>> On 12/13/16 10:50, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>>>
>>>>> +static void add_group(sched_group_t *sched_group, int thr, int group)
>>>>> +{
>>>>> + int num;
>>>>> + uint32_t gen_cnt;
>>>>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
>>>>> +
>>>>> + num = thr_group->num_group;
>>>>> + thr_group->group[num] = group;
>>>>> + thr_group->num_group  = num + 1;
>>>>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
>>>>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
>>>>
>>>>
>>>> odp_atomic_inc_u32()
>>>
>>> Atomic, in-memory increment would be waste here since this operation is
>> done while holding a lock. The variable is atomic since it's checked with
>> load_acq in sched_cmd() (when not holding a lock).
>>>
>>>
>>>>
>>>>
>>>>> +}
>>>>> +
>>>>> +static void remove_group(sched_group_t *sched_group, int thr, int
>>>> group)
>>>>> +{
>>>>> + int i, num;
>>>>> + int found = 0;
>>>>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
>>>>> +
>>>>> + num = thr_group->num_group;
>>>>> +
>>>>> + for (i = 0; i < num; i++) {
>>>>> +         if (thr_group->group[i] == group) {
>>>>> +                 found = 1;
>>>>> +
>>>>> +                 for (; i < num - 1; i++)
>>>>> +                         thr_group->group[i] = thr_group->group[i + 1];
>>>>> +
>>>>> +                 break;
>>>>> +         }
>>>>> + }
>>>>> +
>>>>> + if (found) {
>>>>> +         uint32_t gen_cnt;
>>>>> +
>>>>> +         thr_group->num_group = num - 1;
>>>>> +         gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
>>>>> +         odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
>>>>
>>>> odp_atomic_inc_u32()
>>>>
>>>> and you can remove if found by placing inc and break above. That should
>>>> fit in 80 characters.
>>>
>>>
>>> Same thing here. Load + store is more performant than an unnecessary
>> atomic, in-memory inc.
>>>
>>> -Petri
>>>
>>
>> I think it depends. On ABI compat mode that will be function and you
>> need 2 stores in stack.
>>
>> And what is faster depends on implementation of internal
>> __atomic_fetch_add call, which may vary on gcc version. I never measured
>> what is faster, but it's easy to check...
>>
>> Maxim.
> 
> 
> No, it does not depend on GCC. Those are plain load and store instructions, 
> so there's no atomicity guarantees needed. Also atomics can and should be 
> inlined even in ABI compat mode (which is CPU arch specific anyway). I think 
> those are inlined already using GCC __atomics but for strict ABI compat we 
> should write those in assembly.
> 
> The atomic_inc implements *in-memory* atomic increment, which needs much more 
> synchronization than a load or a store (which does not provide any sync). So, 
> load + store == always about 2 CPU cycles. Atomic_inc may be e.g. 30 cycles, 
> depending on the HW implementation. The main point is that this code section 
> is synchronized with a lock already and it does not need any extra 
> (in-memory) atomic synchronization because of that.
> 
> The variable is atomic only because it's checked with load_acq in sched_cmd().
> 
> -Petri
> 
> 

Reply via email to