Boot warnings on exynos5420 based boards
Hi, I observe the below warnings while trying to boot Exynos5420 based boards since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks like it is triggered by the commit 56e6921829 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline). Any ideas? * [0.046521] Exynos MCPM support installed [0.048939] CPU1: Booted secondary processor [0.065005] CPU1: update cpu_capacity 1535 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001 [0.065660] CPU2: Booted secondary processor [0.085005] CPU2: update cpu_capacity 1535 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002 [0.085662] CPU3: Booted secondary processor [0.105005] CPU3: update cpu_capacity 1535 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003 [1.105031] CPU4: failed to come online [1.105081] [ cut here ] [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [1.105112] Modules linked in: [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140616-2-g38f9385a061b #2035 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [1.105179] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [1.105198] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [1.105253] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94) [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0) [1.105337] [c0b5839c] (smp_init) from [c0b2fc54] (kernel_init_freeable+0x90/0x1dc) [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248] (kernel_init+0xc/0xe8) [1.105368] [c0851248] (kernel_init) from [c020e7f8] (ret_from_fork+0x14/0x3c) [1.105389] ---[ end trace bc66942e4ab63168 ]--- [2.105047] CPU5: failed to come online [2.105073] [ cut here ] [2.105091] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [2.105099] Modules linked in: [2.105114] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.15.0-next-20140616-2-g38f9385a061b #2035 [2.105135] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [2.105153] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [2.105170] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [2.105187] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [2.105205] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [2.105222] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [2.105237] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [2.105252] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [2.105268] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94) [2.105285] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0) [2.105301] [c0b5839c] (smp_init) from [c0b2fc54] (kernel_init_freeable+0x90/0x1dc) [2.105316] [c0b2fc54] (kernel_init_freeable) from [c0851248] (kernel_init+0xc/0xe8) [2.105330] [c0851248] (kernel_init) from [c020e7f8] (ret_from_fork+0x14/0x3c) [2.105339] ---[ end trace bc66942e4ab63169 ]--- [3.105064] CPU6: failed to come online [3.105089] [ cut here ] [3.105107] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [3.105115] Modules linked in: [3.105131] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.15.0-next-20140616-2-g38f9385a061b #2035 [3.105150] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [3.105168] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [3.105185] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [3.105202] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [3.105220] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [3.105237] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [3.105252] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [3.105267] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [3.105283] [c024c1a4] (_cpu_up)
Re: Boot warnings on exynos5420 based boards
Hi Sachin, On 06/17/2014 01:39 PM, Sachin Kamat wrote: Hi, I observe the below warnings while trying to boot Exynos5420 based boards since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks I guess you meant next-20140617. like it is triggered by the commit 56e6921829 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline). Any ideas? * [0.046521] Exynos MCPM support installed [0.048939] CPU1: Booted secondary processor [0.065005] CPU1: update cpu_capacity 1535 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001 [0.065660] CPU2: Booted secondary processor [0.085005] CPU2: update cpu_capacity 1535 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002 [0.085662] CPU3: Booted secondary processor [0.105005] CPU3: update cpu_capacity 1535 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003 [1.105031] CPU4: failed to come online [1.105081] [ cut here ] [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [1.105112] Modules linked in: [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140616-2-g38f9385a061b #2035 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [1.105179] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [1.105198] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [1.105253] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94) [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0) [1.105337] [c0b5839c] (smp_init) from [c0b2fc54] (kernel_init_freeable+0x90/0x1dc) [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248] (kernel_init+0xc/0xe8) [1.105368] [c0851248] (kernel_init) from [c020e7f8] (ret_from_fork+0x14/0x3c) [1.105389] ---[ end trace bc66942e4ab63168 ]--- Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place, since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD. This is what happens when people don't explicitly write fall-through in the comments in a switch-case statement :-( Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). [c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline] [56e692182 - CPU hotplug, smp: flush any pending IPI callbacks before CPU offline] Andrew, can you please use this patch instead? Thanks a lot! -- From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN
Re: Boot warnings on exynos5420 based boards
Hi Srivatsa, Thanks for your prompt reply. On Tue, Jun 17, 2014 at 2:48 PM, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Hi Sachin, On 06/17/2014 01:39 PM, Sachin Kamat wrote: Hi, I observe the below warnings while trying to boot Exynos5420 based boards since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks I guess you meant next-20140617. I meant I started observing this warning next-20140616 onwards (next-20140617 as well). like it is triggered by the commit 56e6921829 (CPU hotplug, smp: flush any pending IPI callbacks before CPU offline). Any ideas? * [0.046521] Exynos MCPM support installed [0.048939] CPU1: Booted secondary processor [0.065005] CPU1: update cpu_capacity 1535 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001 [0.065660] CPU2: Booted secondary processor [0.085005] CPU2: update cpu_capacity 1535 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002 [0.085662] CPU3: Booted secondary processor [0.105005] CPU3: update cpu_capacity 1535 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003 [1.105031] CPU4: failed to come online [1.105081] [ cut here ] [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228 flush_smp_call_function_queue+0xc0/0x178() [1.105112] Modules linked in: [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140616-2-g38f9385a061b #2035 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c] (show_stack+0x10/0x14) [1.105179] [c0211c8c] (show_stack) from [c0853794] (dump_stack+0x8c/0x9c) [1.105198] [c0853794] (dump_stack) from [c024bdf4] (warn_slowpath_common+0x70/0x8c) [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac] (warn_slowpath_null+0x1c/0x24) [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944] (flush_smp_call_function_queue+0xc0/0x178) [1.105253] [c02a3944] (flush_smp_call_function_queue) from [c02a3a94] (hotplug_cfd+0x98/0xd8) [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064] (notifier_call_chain+0x44/0x84) [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4] (_cpu_up+0x120/0x170) [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94) [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0) [1.105337] [c0b5839c] (smp_init) from [c0b2fc54] (kernel_init_freeable+0x90/0x1dc) [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248] (kernel_init+0xc/0xe8) [1.105368] [c0851248] (kernel_init) from [c020e7f8] (ret_from_fork+0x14/0x3c) [1.105389] ---[ end trace bc66942e4ab63168 ]--- Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place, since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD. This is what happens when people don't explicitly write fall-through in the comments in a switch-case statement :-( Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Regards, Sachin. -- To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 03:03 PM, Sachin Kamat wrote: Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com [PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async()| = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about receiving an IPI on an offline CPU. One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks for
Re: Boot warnings on exynos5420 based boards
Hi Srivatsa, On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 06/17/2014 03:03 PM, Sachin Kamat wrote: Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] Unfortunately the attached patch did not apply either. Nevertheless, I applied the patch from your above mentioned tree. With that patch I do not see the warnings that I mentioned in my first mail. Thanks for fixing it. Regards, Sachin. -- To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot warnings on exynos5420 based boards
On 06/17/2014 05:25 PM, Sachin Kamat wrote: Hi Srivatsa, On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: On 06/17/2014 03:03 PM, Sachin Kamat wrote: Below is an updated patch, please let me know how it goes. (You'll have to revert c47a9d7cca first, and then 56e692182, before trying this patch). I am unable to apply your below patch on top of the above 2 reverts. Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline fatal: corrupt patch at line 106 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline Even with 'patch' I get the below failures: patching file kernel/smp.c Hunk #2 FAILED at 53. Hunk #3 FAILED at 179. 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej Hmm, weird. My mailer must have screwed it up. Let's try again: [In case this also doesn't work for you, please use this git tree in which I have reverted the 2 old commits and added this updated patch. git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3 ] Unfortunately the attached patch did not apply either. Nevertheless, I applied the patch from your above mentioned tree. With that patch I do not see the warnings that I mentioned in my first mail. Thanks for fixing it. Sure, thanks for reporting the bug and testing the updated patch! By the way, I think there is some problem in the workflow that you use to copy-paste/apply the patch. I tried applying both patches (that I sent in 2 different mails) and both applied properly without any problems. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html