On 03/14, Peter Zijlstra wrote:
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10346,6 +10346,17 @@ void perf_event_free_task(struct task_struct *task)
>                       continue;
>
>               mutex_lock(&ctx->mutex);
> +             raw_spin_lock_irq(&ctx->lock);
> +             /*
> +              * Destroy the task <-> ctx relation and mark the context dead.
> +              *
> +              * This is important because even though the task hasn't been
> +              * exposed yet the context has been (through child_list).
> +              */
> +             RCU_INIT_POINTER(task->perf_event_ctxp[ctxn], NULL);
> +             WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
> +             put_task_struct(task); /* cannot be last */
> +             raw_spin_unlock_irq(&ctx->lock);

Agreed, this is what I had in mind. Although you know, I spent 3
hours looking at your patch and I still can't convince myself I am
really sure it closes all races ;)

OK, I believe this is correct. And iiuc both RCU_INIT_POINTER(NULL)
and put_task_struct() are not strictly necessary? At least until we
add WARN_ON(tsk->usage != 2) before free_task() in copy process().


---------------------------------------------------------------------
This is off-topic, but to me list_for_each_entry(event->child_list)
in perf_event_release_kernel() looks very confusing and misleading.
And list_first_entry_or_null(), we do not really need NULL if list
is empty, tmp == child should be F even if we use list_first_entry().
And given that we already have list_is_last(), it would be nice to
add list_is_first() and cleanup perf_event_release_kernel() a bit:

        --- x/kernel/events/core.c
        +++ x/kernel/events/core.c
        @@ -4152,7 +4152,7 @@ static void put_event(struct perf_event 
         int perf_event_release_kernel(struct perf_event *event)
         {
                struct perf_event_context *ctx = event->ctx;
        -       struct perf_event *child, *tmp;
        +       struct perf_event *child;
         
                /*
                 * If we got here through err_file: fput(event_file); we will 
not have
        @@ -4190,8 +4190,9 @@ int perf_event_release_kernel(struct per
         
         again:
                mutex_lock(&event->child_mutex);
        -       list_for_each_entry(child, &event->child_list, child_list) {
        -
        +       if (!list_empty(&event->child_list)) {
        +               child = list_first_entry(&event->child_list,
        +                                        struct perf_event, child_list);
                        /*
                         * Cannot change, child events are not migrated, see the
                         * comment with perf_event_ctx_lock_nested().
        @@ -4221,9 +4222,7 @@ again:
                         * state, if child is still the first entry, it didn't 
get freed
                         * and we can continue doing so.
                         */
        -               tmp = list_first_entry_or_null(&event->child_list,
        -                                              struct perf_event, 
child_list);
        -               if (tmp == child) {
        +               if (list_is_first(child, &event->child_list)) {
                                perf_remove_from_context(child, DETACH_GROUP);
                                list_del(&child->child_list);
                                free_event(child);

But we can't, because

        static inline int list_is_first(const struct list_head *list,
                                        const struct list_head *head)
        {
                return list->prev == head;
        }

won't work, "child" can be freed so we can't dereference it, and

        static inline int list_is_first(const struct list_head *list,
                                        const struct list_head *head)
        {
                return head->next == list;
        }

won't be symmetrical with list_is_last() we already have.

Oleg.

Reply via email to