On Thu, Jan 23, 2014 at 05:22:54PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 23, 2014 at 05:28:29AM -0800, Kent Overstreet wrote:
> > pool->lock is also going to be fairly badly contended in the worst case,
> > and that can get real bad real fast... now that I think about it we
> > probably want to avoid the __alloc_global_tag() double call just because
> > of that, pool->lock is going to be quite a bit more contended than the
> > waitlist lock just because fo the amount of work done under it.
> 
> On top of the two previous; I think we can reduce pool->lock contention
> by not holding it while doing steal_tags().
> 
> By dropping pool->lock around steal_tags() we loose serialization over:
> 
>   pool->cpus_have_tags is an atomic bitmask, and
>   pool->cpu_last_stolem, that's a heuristic anyway, so sod it.
> 
> We further loose the guarantee relied upon by percpu_ida_free(), so have
> it also acquire the tags->lock, which should be a far less contended
> resource.
> 
> Now everything modifying percpu_ida_cpu state holds
> percpu_ida_cpu::lock, everything that modifies the actual percpu_ida
> freelists holds percpu_ida::lock, and percpu_ida_cpu::lock nests inside
> percpu_ida::lock.
> 
> 
> The only annoying thing is that we're still holding IRQs over
> steal_tags(), we should be able to make that a preempt_disable() without
> too much effort, or very much cheat and drop even that and rely on the
> percpu_ida_cpu::lock to serialize everything and just hope that we don't
> migrate too often.
> 
> But that's for another patch.
> 
> ---
> --- a/lib/percpu_ida.c
> +++ b/lib/percpu_ida.c
> @@ -68,8 +68,6 @@ static inline void steal_tags(struct per
>       unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
>       struct percpu_ida_cpu *remote;
>  
> -     lockdep_assert_held(&pool->lock);
> -
>       for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
>            cpus_have_tags * pool->percpu_max_size > pool->nr_tags / 2;
>            cpus_have_tags--) {

Two concurrent threads find and unset the very same bit few lines below

                cpu = cpumask_next(cpu, &pool->cpus_have_tags);

                [...]

                cpumask_clear_cpu(cpu, &pool->cpus_have_tags);

The second's thread unset races with cpumask_set_cpu() in percpu_ida_free()
or alloc_global_tag()

        if (nr_free == 1) {
                cpumask_set_cpu(smp_processor_id(),
                                &pool->cpus_have_tags);
                wake_up(&pool->wait);
        }

Which results in the woken thread does not find the (illegitimately) unset
bit while looping over the mask in steal_tags(). I suspect this or another
thread might enter an unwakable sleep as the number of bits in the mask
and number of threads in the waitqueue became unbalanced.

Hope, I am wrong here.

> @@ -141,18 +139,24 @@ static inline int alloc_global_tag(struc
>                         min(pool->nr_free, pool->percpu_batch_size));
>       }
>  
> +     spin_unlock(&pool->lock);
> +
>       if (!tags->nr_free)
>               steal_tags(pool, tags);
>  
>       if (tags->nr_free) {
> -             tag = tags->freelist[--tags->nr_free];
> +             spin_lock(&tags->lock);
>               if (tags->nr_free) {
> -                     cpumask_set_cpu(smp_processor_id(),
> -                                     &pool->cpus_have_tags);
> +                     tag = tags->freelist[--tags->nr_free];
> +                     if (tags->nr_free) {
> +                             cpumask_set_cpu(smp_processor_id(),
> +                                             &pool->cpus_have_tags);
> +                     }
>               }
> +             spin_unlock(&tags->lock);
>       }
>  
> -     spin_unlock_irqrestore(&pool->lock, flags);
> +     local_irq_restore(flags);
>  
>       return tag;
>  }
> @@ -238,12 +242,8 @@ void percpu_ida_free(struct percpu_ida *
>  
>       if (nr_free == pool->percpu_max_size) {
>               spin_lock(&pool->lock);
> +             spin_lock(&tags->lock);
>  
> -             /*
> -              * Global lock held and irqs disabled, don't need percpu lock
> -              * because everybody accessing remote @tags will hold
> -              * pool->lock -- steal_tags().
> -              */
>               if (tags->nr_free == pool->percpu_max_size) {
>                       move_tags(pool->freelist, &pool->nr_free,
>                                 tags->freelist, &tags->nr_free,
> @@ -251,6 +251,8 @@ void percpu_ida_free(struct percpu_ida *
>  
>                       wake_up(&pool->wait);
>               }
> +
> +             spin_unlock(&tags->lock);
>               spin_unlock(&pool->lock);
>       }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to