On Tue, 04 Dec 2012 14:23:41 +0530
"Srivatsa S. Bhat" <[email protected]> wrote:

> From: Michael Wang <[email protected]>
> 
> There are places where preempt_disable() is used to prevent any CPU from
> going offline during the critical section. Let us call them as "atomic
> hotplug readers" (atomic because they run in atomic contexts).
> 
> Often, these atomic hotplug readers have a simple need : they want the cpu
> online mask that they work with (inside their critical section), to be
> stable, i.e., it should be guaranteed that CPUs in that mask won't go
> offline during the critical section. An important point here is that they
> don't really care if such a "stable" mask is a subset of the actual
> cpu_online_mask.
> 
> The intent of this patch is to provide such a "stable" cpu online mask
> for that class of atomic hotplug readers.
> 
> Fundamental idea behind the design:
> -----------------------------------
> 
> Simply put, have a separate mask called the stable cpu online mask; and
> at the hotplug writer (cpu_down()), note down the CPU that is about to go
> offline, and remove it from the stable cpu online mask. Then, feel free
> to take that CPU offline, since the atomic hotplug readers won't see it
> from now on. Also, don't start any new cpu_down() operations until all
> existing atomic hotplug readers have completed (because they might still
> be working with the old value of the stable online mask).
> 
> Some important design requirements and considerations:
> -----------------------------------------------------
> 
> 1. The atomic hotplug readers should ideally *never* wait for the hotplug
>    writer (cpu_down()) for *anything*. Because, these atomic hotplug readers
>    can be in very hot-paths like interrupt handling/IPI and hence, if they
>    have to wait for an ongoing cpu_down() to complete, it would pretty much
>    introduce the same performance/latency problems as stop_machine().
> 
> 2. Any synchronization at the atomic hotplug readers side must be highly
>    scalable - avoid global locks/counters etc. Because, these paths currently
>    use the extremely fast preempt_disable(); our replacement to
>    preempt_disable() should not become ridiculously costly.
> 
> 3. preempt_disable() was recursive. The replacement should also be recursive.
> 
> Implementation of the design:
> ----------------------------
> 
> Atomic hotplug reader side:
> 
> We use per-cpu counters to mark the presence of atomic hotplug readers.
> A reader would increment its per-cpu counter and continue, without waiting
> for anything. And no locks are used in this path. Together, these satisfy
> all the 3 requirements mentioned above.
> 
> The hotplug writer uses (reads) the per-cpu counters of all CPUs in order to
> ensure that all existing atomic hotplug readers have completed. Only after
> that, it will proceed to actually take the CPU offline.
> 
> [ Michael: Designed the synchronization for the IPI case ]

Like this:

[[email protected]: designed the synchronization for the IPI case]

> Signed-off-by: Michael Wang <[email protected]>
> [ Srivatsa: Generalized it to work for all cases and wrote the changelog ]
> Signed-off-by: Srivatsa S. Bhat <[email protected]>
>
> ...
>
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;
>  
>  extern void get_online_cpus(void);
>  extern void put_online_cpus(void);
> +extern void get_online_cpus_stable_atomic(void);
> +extern void put_online_cpus_stable_atomic(void);
>  #define hotcpu_notifier(fn, pri)     cpu_notifier(fn, pri)
>  #define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
>  #define unregister_hotcpu_notifier(nb)       unregister_cpu_notifier(nb)
> @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)
>  
>  #define get_online_cpus()    do { } while (0)
>  #define put_online_cpus()    do { } while (0)
> +#define get_online_cpus_stable_atomic()      do { } while (0)
> +#define put_online_cpus_stable_atomic()      do { } while (0)

static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;)

>
> ...
>
> @@ -705,12 +707,15 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
>  
>  #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
>  #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
> +#define for_each_online_cpu_stable(cpu)                                      
>   \
> +                             for_each_cpu((cpu), cpu_online_stable_mask)
>  #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
>  
>  /* Wrappers for arch boot code to manipulate normally-constant masks */
>  void set_cpu_possible(unsigned int cpu, bool possible);
>  void set_cpu_present(unsigned int cpu, bool present);
>  void set_cpu_online(unsigned int cpu, bool online);
> +void set_cpu_online_stable(unsigned int cpu, bool online);

The naming is inconsistent.  This is "cpu_online_stable", but
for_each_online_cpu_stable() is "online_cpu_stable".  Can we make
everything the same?

>  void set_cpu_active(unsigned int cpu, bool active);
>  void init_cpu_present(const struct cpumask *src);
>  void init_cpu_possible(const struct cpumask *src);
>
> ...
>
> +void get_online_cpus_stable_atomic(void)
> +{
> +     preempt_disable();
> +     this_cpu_inc(hotplug_reader_refcount);
> +
> +     /*
> +      * Prevent reordering of writes to hotplug_reader_refcount and
> +      * reads from cpu_online_stable_mask.
> +      */
> +     smp_mb();
> +}
> +EXPORT_SYMBOL_GPL(get_online_cpus_stable_atomic);
> +
> +void put_online_cpus_stable_atomic(void)
> +{
> +     /*
> +      * Prevent reordering of reads from cpu_online_stable_mask and
> +      * writes to hotplug_reader_refcount.
> +      */
> +     smp_mb();
> +     this_cpu_dec(hotplug_reader_refcount);
> +     preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(put_online_cpus_stable_atomic);
> +
>  static struct {
>       struct task_struct *active_writer;
>       struct mutex lock; /* Synchronizes accesses to refcount, */
> @@ -237,6 +304,44 @@ static inline void check_for_tasks(int cpu)
>       write_unlock_irq(&tasklist_lock);
>  }
>  
> +/*
> + * We want all atomic hotplug readers to refer to the new value of
> + * cpu_online_stable_mask. So wait for the existing atomic hotplug readers
> + * to complete. Any new atomic hotplug readers will see the updated mask
> + * and hence pose no problems.
> + */
> +static void sync_hotplug_readers(void)
> +{
> +     unsigned int cpu;
> +
> +     for_each_online_cpu(cpu) {
> +             while (per_cpu(hotplug_reader_refcount, cpu))
> +                     cpu_relax();
> +     }
> +}

That all looks solid to me.

> +/*
> + * We are serious about taking this CPU down. So clear the CPU from the
> + * stable online mask.
> + */
> +static void prepare_cpu_take_down(unsigned int cpu)
> +{
> +     set_cpu_online_stable(cpu, false);
> +
> +     /*
> +      * Prevent reordering of writes to cpu_online_stable_mask and reads
> +      * from hotplug_reader_refcount.
> +      */
> +     smp_mb();
> +
> +     /*
> +      * Wait for all active hotplug readers to complete, to ensure that
> +      * there are no critical sections still referring to the old stable
> +      * online mask.
> +      */
> +     sync_hotplug_readers();
> +}

I wonder about the cpu-online case.  A typical caller might want to do:


/*
 * Set each online CPU's "foo" to "bar"
 */

int global_bar;

void set_cpu_foo(int bar)
{
        get_online_cpus_stable_atomic();
        global_bar = bar;
        for_each_online_cpu_stable()
                cpu->foo = bar;
        put_online_cpus_stable_atomic()
}

void_cpu_online_notifier_handler(void)
{
        cpu->foo = global_bar;
}

And I think that set_cpu_foo() would be buggy, because a CPU could come
online before global_bar was altered, and that newly-online CPU would
pick up the old value of `bar'.

So what's the rule here?  global_bar must be written before we run
get_online_cpus_stable_atomic()?

Anyway, please have a think and spell all this out?

>  struct take_cpu_down_param {
>       unsigned long mod;
>       void *hcpu;
> @@ -246,7 +351,9 @@ struct take_cpu_down_param {
>  static int __ref take_cpu_down(void *_param)
>  {
>       struct take_cpu_down_param *param = _param;
> -     int err;
> +     int err, cpu = (long)(param->hcpu);
> +

Like this please:

        int err;
        int cpu = (long)(param->hcpu);

> +     prepare_cpu_take_down(cpu);
>  
>       /* Ensure this CPU doesn't handle any more interrupts. */
>       err = __cpu_disable();
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to