On Thu, 2012-10-04 at 11:02 -0400, Steven Rostedt wrote:

> void __init softirq_early_init(void)
> {
>       local_irq_lock_init(local_softirq_lock);
> }
> 
> Where:
> 
> #define local_irq_lock_init(lvar)                                     \
>       do {                                                            \
>               int __cpu;                                              \
>               for_each_possible_cpu(__cpu)                            \
>                       spin_lock_init(&per_cpu(lvar, __cpu).lock);     \
>       } while (0)
> 
> As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
> initialization is done to all per_cpu versions of the lock. But lets
> look at where the softirq_early_init() is called from.
> 
> In init/main.c: start_kernel()
> 
> /*
>  * Interrupts are still disabled. Do necessary setups, then
>  * enable them
>  */
>       softirq_early_init();
>       tick_init();
>       boot_cpu_init();
>       page_address_init();
>       printk(KERN_NOTICE "%s", linux_banner);
>       setup_arch(&command_line);
>       mm_init_owner(&init_mm, &init_task);
>       mm_init_cpumask(&init_mm);
>       setup_command_line(command_line);
>       setup_nr_cpu_ids();
>       setup_per_cpu_areas();
>       smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> 
> One of the first things that is called is the initialization of the
> softirq lock. But if you look further down, we see the per_cpu areas
> have not been set up yet. Thus initializing a local_irq_lock() before
> the per_cpu section is set up, may not work as it is initializing the
> per cpu locks before the per cpu exists.
> 
> By moving the softirq_early_init() right after setup_per_cpu_areas(),
> the kernel boots fine.
> 

I investigated why this still works on x86, and found this. By adding
some printks:

void __init softirq_early_init(void)
{
        int __cpu;
        printk("init softirq locks\n");
        local_irq_lock_init(local_softirq_lock);

        printk("list locks\n");
        for_each_possible_cpu(__cpu)
                printk("local_softirq_lock[%d].node_list=%p\n", __cpu,
                       
per_cpu(local_softirq_lock,__cpu).lock.lock.wait_list.node_list.prev);
}

The output was:

Initializing cgroup subsys cpu
init softirq locks
list locks
Linux version 3.2.30-test-rt45+ (rostedt@goliath) (gcc version 4.6.0 (GCC) ) 
#262 SMP PREEMPT RT Thu Oct 4 15:48:16 EDT 2012
Command line: ro root=/dev/mapper/VG01-F13x64 rd_LVM_LV=VG01/F13x64 rd_NO_LUKS 
rd_NO_MD rd_NO_DM console=ttyS0,115200 ignore_loglevel selinux=0 
earlyprintk=ttyS0,115200 ftrace_dump
_on_oops


Note, it printed "list locks" but never printed anything for that loop.
Seems that before the per_cpu area is initialized, the
for_each_possible_cpu() does not execute. To confirm this, I added that
same loop in spawn_ksoftirq() and it shows this:

... fixed-purpose events:   3
... event mask:             0000000700000003
local_softirq_lock[0].node_list=          (null)
local_softirq_lock[1].node_list=          (null)
local_softirq_lock[2].node_list=          (null)
local_softirq_lock[3].node_list=          (null)
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 98000

Yep, the node_list was never initialized.

This doesn't crash x86 because it is saved by:

static inline void init_lists(struct rt_mutex *lock)
{
        if (unlikely(!lock->wait_list.node_list.prev))
                plist_head_init(&lock->wait_list);
}

and the first time something blocks on the lock, the wait_list is
initialized.


The reason that it crashes on powerpc, is because the
for_each_possible_cpu() actually does loop:

(on powerpc box)

Initializing cgroup subsys cpuset^M
Initializing cgroup subsys cpu
init softirq locks
list locks^M
local_softirq_lock[0].node_list=c000000000781f00
local_softirq_lock[1].node_list=c000000000781f00
Linux version 3.2.30-test-rt45-dirty (rostedt@goliath) (gcc version 4.6.0 (GCC) 
) #24 SMP PREEMPT RT Thu Oct 4 15:55:07 EDT 2012^M
[0000] : CF000012^M

The problem is that the per_cpu() returns the same pointer for each CPU
passed to it (as you can see, the node_list pointer is the same). As the
node_list was initialized, but to the wrong pointer, the init_lists()
above will not correct the problem as it did with x86. When the
wait_list starts to be used, it will soon become corrupted.

Moving the init to after the per_cpu setup, I get this:

pcpu-alloc: s84096 r0 d46976 u524288 alloc=1*1048576
pcpu-alloc: [0] 0 1 
init softirq locks
list locks
local_softirq_lock[0].node_list=c000000001001f00
local_softirq_lock[1].node_list=c000000001081f00
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 16370

As you can see, the node_lists are now unique per_cpu.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to