On 12/13/16 10:50, Savolainen, Petri (Nokia - FI/Espoo) wrote:
>
>>> +static void add_group(sched_group_t *sched_group, int thr, int group)
>>> +{
>>> + int num;
>>> + uint32_t gen_cnt;
>>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
>>> +
>>> + num = thr_group->num_group;
>>> + thr_group->group[num] = group;
>>> + thr_group->num_group = num + 1;
>>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
>>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
>>
>>
>> odp_atomic_inc_u32()
>
> Atomic, in-memory increment would be waste here since this operation is done
> while holding a lock. The variable is atomic since it's checked with load_acq
> in sched_cmd() (when not holding a lock).
>
>
>>
>>
>>> +}
>>> +
>>> +static void remove_group(sched_group_t *sched_group, int thr, int
>> group)
>>> +{
>>> + int i, num;
>>> + int found = 0;
>>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
>>> +
>>> + num = thr_group->num_group;
>>> +
>>> + for (i = 0; i < num; i++) {
>>> + if (thr_group->group[i] == group) {
>>> + found = 1;
>>> +
>>> + for (; i < num - 1; i++)
>>> + thr_group->group[i] = thr_group->group[i + 1];
>>> +
>>> + break;
>>> + }
>>> + }
>>> +
>>> + if (found) {
>>> + uint32_t gen_cnt;
>>> +
>>> + thr_group->num_group = num - 1;
>>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
>>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
>>
>> odp_atomic_inc_u32()
>>
>> and you can remove if found by placing inc and break above. That should
>> fit in 80 characters.
>
>
> Same thing here. Load + store is more performant than an unnecessary atomic,
> in-memory inc.
>
> -Petri
>
I think it depends. On ABI compat mode that will be function and you
need 2 stores in stack.
And what is faster depends on implementation of internal
__atomic_fetch_add call, which may vary on gcc version. I never measured
what is faster, but it's easy to check...
Maxim.
>