On Tue, Jun 17, 2014 at 02:27:43PM -0500, Christoph Lameter wrote:
> On Thu, 12 Jun 2014, Tejun Heo wrote:
> 
> > percpu areas are zeroed on allocation and, by its nature, accessed
> > from multiple cpus.  Consider the following scenario.
> 
> I am not sure that the premise is actually right. Percpu areas are
> designed to be accessed from a single cpu and we provide instances
> of variables for each cpu.
> 
> There is no synchronization guarantee for accesses from other cpu. If
> these accesses occur then we tolerate some fuzziness and usualy only do
> read accesses. F.e. for statistics if we loop over all cpus to get a sum
> of percpu counters (which is a classic use case for percpu data).
> 
> But there are numerous uses where no accesses from other cpus are required
> (mostly when percpu stuff is not used for statistics but for cpu local
> lists and status).
> 
> Cross cpu write accesses typically occur only after the allocation and
> before the code that actually does something is aware of the existence of
> the percpu area allocated or if the processor is being offlines/onlines.
> 
>  > >  p = NULL; >
> >     CPU-1                           CPU-2
> >  p = alloc_percpu()         if (p)
> >                                     WARN_ON(this_cpu_read(*p));
> 
> p is an offset into the per cpu area of the processor. The value of P
> first has to be made available to cpu2 somehow and this usually provides
> the opportunity for synchronization that avoids the above scenario.
> 
> And so it is typical that these offsets are stored in larger structs that
> also have other means of synchronization.
> 
> F.e. Allocators take a global lock and then instantiate a new
> structure with the associated per cpu area allocation which is added to a
> global list after it is ready. The address of the allocator structure
> is then made available to other processors.
> 
> Another method is to perform this allocation on bootup which then also
> does not require synchronization (page allocator).
> 
> Similar in swapon(). The percpu allocation is performed before access to
> the containing structure (via enable_swap_info).

Those are indeed common use cases.  However...

There is code where one CPU writes to another CPU's per-CPU variables.
One example is RCU callback offloading, where a kernel thread (which
might be running anywhere) dequeues a given CPU's RCU callbacks and
processes them.  The act of dequeuing requires write access to that
CPU's per-CPU rcu_data structure.  And yes, atomic operations and memory
barriers are of course required to make this work.

                                                        Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to