On 2/24/26 13:09, Oleg Nesterov wrote:
> On 02/23, Pavel Tikhomirov wrote:
>>
>> To avoid possible problems related to cpu/compiler optimizations around
>> ->child_reaper, let's use WRITE_ONCE (additional to task_list lock)
>> everywhere we write it and use READ_ONCE everywhere we read it without
>> explicit lock.
> 
> Yes, this is what I meant... but I can never recall if READ_ONCE() alone
> is enough to make KCSAN happy...

AFAICS this should be fine for memory safety of accesses to ->child_reaper.
I would love if someone more experienced in this area would confirm.

> 
> I won't insist, but I think it would be better to do this in a separate
> change for documenation purposes and for discussion.

Ok, will do. It will be a bit ugly as I will first add READ_ONCE to the
pidns_for_children_get() and then remove the hunk with it in the next patch.

> 
>> @@ -247,8 +247,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t 
>> *arg_set_tid,
>>                       * alreay in use. Return EEXIST in that case.
>>                       */
>>                      if (nr == -ENOSPC)
>> -
>>                              nr = -EEXIST;
>> +            } else if (!READ_ONCE(tmp->child_reaper) && 
>> idr_get_cursor(&tmp->idr) != 0) {
>> +                    nr = -EINVAL;
>>              } else {
> 
> Oh, this doesn't look clear/clean... This even looks racy even if it is not.
> Can you move this check into the "else" branch which does another get_cursor
> and unify this check with the RESERVED_PIDS check?
> 
> Either way, I don't like the fact we check ->child_reaper != NULL twice.

Notice that we don't really read it twice, it is only twice in code but any real
execution either passes the first check or the second one, not both at the same
time as they are in tid set/unset branches correspondingly.

> Perhaps something like the preparational patch below makes sense ? Not
> sure this is actually better...

This looks more universal at least, as instead of two checks we have one in one
place. My only concern of putting the check where I put it was to avoid extra
idr_alloc_cyclic() + idr_remove(), if we already know we don't need it. But it's
only in last pid_namespace we can have ->child_reaper unset so we do 
alloc/remove
for all other namespaces anyway in error case, should not be a big deal.

I will add the preparation patches: for below patch and related to _ONCE.

> 
> Oleg.
> 
> --- x/kernel/pid.c
> +++ x/kernel/pid.c
> @@ -215,12 +215,6 @@ struct pid *alloc_pid(struct pid_namespa
>                       retval = -EINVAL;
>                       if (tid < 1 || tid >= pid_max[ns->level - i])
>                               goto out_abort;
> -                     /*
> -                      * Also fail if a PID != 1 is requested and
> -                      * no PID 1 exists.
> -                      */
> -                     if (tid != 1 && !tmp->child_reaper)
> -                             goto out_abort;
>                       retval = -EPERM;
>                       if (!checkpoint_restore_ns_capable(tmp->user_ns))
>                               goto out_abort;
> @@ -299,6 +293,11 @@ struct pid *alloc_pid(struct pid_namespa
>               tmp = tmp->parent;
>               i--;
>               retried_preload = false;
> +
> +             if (!READ_ONCE(tmp->child_reaper) && nr != 1) {
> +                     retval = -EINVAL;
> +                     goto out_free;
> +             }
>       }
>  
>       /*
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.


Reply via email to