Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, May 19, 2020 at 11:45:24PM +0200, Ahmed S. Darwish wrote:
> > @@ -713,10 +713,20 @@ static void lru_add_drain_per_cpu(struct work_struct 
> > *dummy)
> >   */
> >  void lru_add_drain_all(void)
> >  {
>

Re-adding cut-out comment for context:

        /*
         * lru_drain_gen - Current generation of pages that could be in vectors
         *
         * (A) Definition: lru_drain_gen = x implies that all generations
         *     0 < n <= x are already scheduled for draining.
         *
         * This is an optimization for the highly-contended use case where a
         * user space workload keeps constantly generating a flow of pages
         * for each CPU.
         */
> > +   static unsigned int lru_drain_gen;
> >     static struct cpumask has_work;
> > +   static DEFINE_MUTEX(lock);
> > +   int cpu, this_gen;
> >
> >     /*
> >      * Make sure nobody triggers this path before mm_percpu_wq is fully
> > @@ -725,21 +735,48 @@ void lru_add_drain_all(void)
> >     if (WARN_ON(!mm_percpu_wq))
> >             return;
> >
>

Re-adding cut-out comment for context:

        /*
         * (B) Cache the LRU draining generation number
         *
         * smp_rmb() ensures that the counter is loaded before the mutex is
         * taken. It pairs with the smp_wmb() inside the mutex critical section
         * at (D).
         */
> > +   this_gen = READ_ONCE(lru_drain_gen);
> > +   smp_rmb();
>
>       this_gen = smp_load_acquire(&lru_drain_gen);

ACK. will do.

> >
> >     mutex_lock(&lock);
> >
> >     /*
> > +    * (C) Exit the draining operation if a newer generation, from another
> > +    * lru_add_drain_all(), was already scheduled for draining. Check (A).
> >      */
> > +   if (unlikely(this_gen != lru_drain_gen))
> >             goto done;
> >
>

Re-adding cut-out comment for context:

        /*
         * (D) Increment generation number
         *
         * Pairs with READ_ONCE() and smp_rmb() at (B), outside of the critical
         * section.
         *
         * This pairing must be done here, before the for_each_online_cpu loop
         * below which drains the page vectors.
         *
         * Let x, y, and z represent some system CPU numbers, where x < y < z.
         * Assume CPU #z is is in the middle of the for_each_online_cpu loop
         * below and has already reached CPU #y's per-cpu data. CPU #x comes
         * along, adds some pages to its per-cpu vectors, then calls
         * lru_add_drain_all().
         *
         * If the paired smp_wmb() below is done at any later step, e.g. after
         * the loop, CPU #x will just exit at (C) and miss flushing out all of
         * its added pages.
         */
> > +   WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
> > +   smp_wmb();
>
> You can leave this smp_wmb() out and rely on the smp_mb() implied by
> queue_work_on()'s test_and_set_bit().
>

Won't this be too implicit?

Isn't it possible that, over the years, queue_work_on() impementation
changes and the test_and_set_bit()/smp_mb() gets removed?

If that happens, this commit will get *silently* broken and the local
CPU pages won't be drained.

> >     cpumask_clear(&has_work);
> > -
> >     for_each_online_cpu(cpu) {
> >             struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
> >
>
> While you're here, do:
>
>       s/cpumask_set_cpu/__&/
>

ACK.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

Reply via email to