Boot warnings on exynos5420 based boards

2014-06-17 Thread Sachin Kamat
Hi,

I observe the below warnings while trying to boot Exynos5420 based boards
since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks
like it is triggered by the commit 56e6921829 (CPU hotplug, smp:
flush any pending IPI callbacks before CPU offline). Any ideas?


*
[0.046521] Exynos MCPM support installed
[0.048939] CPU1: Booted secondary processor
[0.065005] CPU1: update cpu_capacity 1535
[0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
[0.065660] CPU2: Booted secondary processor
[0.085005] CPU2: update cpu_capacity 1535
[0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002
[0.085662] CPU3: Booted secondary processor
[0.105005] CPU3: update cpu_capacity 1535
[0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003
[1.105031] CPU4: failed to come online
[1.105081] [ cut here ]
[1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
flush_smp_call_function_queue+0xc0/0x178()
[1.105112] Modules linked in:
[1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.15.0-next-20140616-2-g38f9385a061b #2035
[1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c]
(show_stack+0x10/0x14)
[1.105179] [c0211c8c] (show_stack) from [c0853794]
(dump_stack+0x8c/0x9c)
[1.105198] [c0853794] (dump_stack) from [c024bdf4]
(warn_slowpath_common+0x70/0x8c)
[1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac]
(warn_slowpath_null+0x1c/0x24)
[1.105235] [c024beac] (warn_slowpath_null) from [c02a3944]
(flush_smp_call_function_queue+0xc0/0x178)
[1.105253] [c02a3944] (flush_smp_call_function_queue) from
[c02a3a94] (hotplug_cfd+0x98/0xd8)
[1.105269] [c02a3a94] (hotplug_cfd) from [c026b064]
(notifier_call_chain+0x44/0x84)
[1.105285] [c026b064] (notifier_call_chain) from [c024c1a4]
(_cpu_up+0x120/0x170)
[1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94)
[1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0)
[1.105337] [c0b5839c] (smp_init) from [c0b2fc54]
(kernel_init_freeable+0x90/0x1dc)
[1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248]
(kernel_init+0xc/0xe8)
[1.105368] [c0851248] (kernel_init) from [c020e7f8]
(ret_from_fork+0x14/0x3c)
[1.105389] ---[ end trace bc66942e4ab63168 ]---
[2.105047] CPU5: failed to come online
[2.105073] [ cut here ]
[2.105091] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
flush_smp_call_function_queue+0xc0/0x178()
[2.105099] Modules linked in:
[2.105114] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
3.15.0-next-20140616-2-g38f9385a061b #2035
[2.105135] [c02160f0] (unwind_backtrace) from [c0211c8c]
(show_stack+0x10/0x14)
[2.105153] [c0211c8c] (show_stack) from [c0853794]
(dump_stack+0x8c/0x9c)
[2.105170] [c0853794] (dump_stack) from [c024bdf4]
(warn_slowpath_common+0x70/0x8c)
[2.105187] [c024bdf4] (warn_slowpath_common) from [c024beac]
(warn_slowpath_null+0x1c/0x24)
[2.105205] [c024beac] (warn_slowpath_null) from [c02a3944]
(flush_smp_call_function_queue+0xc0/0x178)
[2.105222] [c02a3944] (flush_smp_call_function_queue) from
[c02a3a94] (hotplug_cfd+0x98/0xd8)
[2.105237] [c02a3a94] (hotplug_cfd) from [c026b064]
(notifier_call_chain+0x44/0x84)
[2.105252] [c026b064] (notifier_call_chain) from [c024c1a4]
(_cpu_up+0x120/0x170)
[2.105268] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94)
[2.105285] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0)
[2.105301] [c0b5839c] (smp_init) from [c0b2fc54]
(kernel_init_freeable+0x90/0x1dc)
[2.105316] [c0b2fc54] (kernel_init_freeable) from [c0851248]
(kernel_init+0xc/0xe8)
[2.105330] [c0851248] (kernel_init) from [c020e7f8]
(ret_from_fork+0x14/0x3c)
[2.105339] ---[ end trace bc66942e4ab63169 ]---
[3.105064] CPU6: failed to come online
[3.105089] [ cut here ]
[3.105107] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
flush_smp_call_function_queue+0xc0/0x178()
[3.105115] Modules linked in:
[3.105131] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
3.15.0-next-20140616-2-g38f9385a061b #2035
[3.105150] [c02160f0] (unwind_backtrace) from [c0211c8c]
(show_stack+0x10/0x14)
[3.105168] [c0211c8c] (show_stack) from [c0853794]
(dump_stack+0x8c/0x9c)
[3.105185] [c0853794] (dump_stack) from [c024bdf4]
(warn_slowpath_common+0x70/0x8c)
[3.105202] [c024bdf4] (warn_slowpath_common) from [c024beac]
(warn_slowpath_null+0x1c/0x24)
[3.105220] [c024beac] (warn_slowpath_null) from [c02a3944]
(flush_smp_call_function_queue+0xc0/0x178)
[3.105237] [c02a3944] (flush_smp_call_function_queue) from
[c02a3a94] (hotplug_cfd+0x98/0xd8)
[3.105252] [c02a3a94] (hotplug_cfd) from [c026b064]
(notifier_call_chain+0x44/0x84)
[3.105267] [c026b064] (notifier_call_chain) from [c024c1a4]
(_cpu_up+0x120/0x170)
[3.105283] [c024c1a4] (_cpu_up) 

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
Hi Sachin,

On 06/17/2014 01:39 PM, Sachin Kamat wrote:
 Hi,
 
 I observe the below warnings while trying to boot Exynos5420 based boards
 since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks

I guess you meant next-20140617.

 like it is triggered by the commit 56e6921829 (CPU hotplug, smp:
 flush any pending IPI callbacks before CPU offline). Any ideas?
 
 
 *
 [0.046521] Exynos MCPM support installed
 [0.048939] CPU1: Booted secondary processor
 [0.065005] CPU1: update cpu_capacity 1535
 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
 [0.065660] CPU2: Booted secondary processor
 [0.085005] CPU2: update cpu_capacity 1535
 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002
 [0.085662] CPU3: Booted secondary processor
 [0.105005] CPU3: update cpu_capacity 1535
 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003
 [1.105031] CPU4: failed to come online
 [1.105081] [ cut here ]
 [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
 flush_smp_call_function_queue+0xc0/0x178()
 [1.105112] Modules linked in:
 [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
 3.15.0-next-20140616-2-g38f9385a061b #2035
 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c]
 (show_stack+0x10/0x14)
 [1.105179] [c0211c8c] (show_stack) from [c0853794]
 (dump_stack+0x8c/0x9c)
 [1.105198] [c0853794] (dump_stack) from [c024bdf4]
 (warn_slowpath_common+0x70/0x8c)
 [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac]
 (warn_slowpath_null+0x1c/0x24)
 [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944]
 (flush_smp_call_function_queue+0xc0/0x178)
 [1.105253] [c02a3944] (flush_smp_call_function_queue) from
 [c02a3a94] (hotplug_cfd+0x98/0xd8)
 [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064]
 (notifier_call_chain+0x44/0x84)
 [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4]
 (_cpu_up+0x120/0x170)
 [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94)
 [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0)
 [1.105337] [c0b5839c] (smp_init) from [c0b2fc54]
 (kernel_init_freeable+0x90/0x1dc)
 [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248]
 (kernel_init+0xc/0xe8)
 [1.105368] [c0851248] (kernel_init) from [c020e7f8]
 (ret_from_fork+0x14/0x3c)
 [1.105389] ---[ end trace bc66942e4ab63168 ]---

Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place,
since I hadn't noticed that CPU_UP_CANCELED silently falls-through to CPU_DEAD.
This is what happens when people don't explicitly write fall-through in the
comments in a switch-case statement :-(

Below is an updated patch, please let me know how it goes. (You'll have to
revert c47a9d7cca first, and then 56e692182, before trying this patch).

[c47a9d7cca - CPU hotplug, smp: Execute any pending IPI callbacks before CPU
  offline]
[56e692182  - CPU hotplug, smp: flush any pending IPI callbacks before CPU
  offline]

Andrew, can you please use this patch instead?

Thanks a lot!

--

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN 

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Sachin Kamat
Hi Srivatsa,

Thanks for your prompt reply.

On Tue, Jun 17, 2014 at 2:48 PM, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 Hi Sachin,

 On 06/17/2014 01:39 PM, Sachin Kamat wrote:
 Hi,

 I observe the below warnings while trying to boot Exynos5420 based boards
 since yesterday's linux-next (next-20140616) using multi_v7_defconfig. Looks

 I guess you meant next-20140617.

I meant I started observing this warning next-20140616 onwards
(next-20140617 as well).


 like it is triggered by the commit 56e6921829 (CPU hotplug, smp:
 flush any pending IPI callbacks before CPU offline). Any ideas?


 *
 [0.046521] Exynos MCPM support installed
 [0.048939] CPU1: Booted secondary processor
 [0.065005] CPU1: update cpu_capacity 1535
 [0.065011] CPU1: thread -1, cpu 1, socket 0, mpidr 8001
 [0.065660] CPU2: Booted secondary processor
 [0.085005] CPU2: update cpu_capacity 1535
 [0.085012] CPU2: thread -1, cpu 2, socket 0, mpidr 8002
 [0.085662] CPU3: Booted secondary processor
 [0.105005] CPU3: update cpu_capacity 1535
 [0.105011] CPU3: thread -1, cpu 3, socket 0, mpidr 8003
 [1.105031] CPU4: failed to come online
 [1.105081] [ cut here ]
 [1.105104] WARNING: CPU: 0 PID: 1 at kernel/smp.c:228
 flush_smp_call_function_queue+0xc0/0x178()
 [1.105112] Modules linked in:
 [1.105129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
 3.15.0-next-20140616-2-g38f9385a061b #2035
 [1.105157] [c02160f0] (unwind_backtrace) from [c0211c8c]
 (show_stack+0x10/0x14)
 [1.105179] [c0211c8c] (show_stack) from [c0853794]
 (dump_stack+0x8c/0x9c)
 [1.105198] [c0853794] (dump_stack) from [c024bdf4]
 (warn_slowpath_common+0x70/0x8c)
 [1.105216] [c024bdf4] (warn_slowpath_common) from [c024beac]
 (warn_slowpath_null+0x1c/0x24)
 [1.105235] [c024beac] (warn_slowpath_null) from [c02a3944]
 (flush_smp_call_function_queue+0xc0/0x178)
 [1.105253] [c02a3944] (flush_smp_call_function_queue) from
 [c02a3a94] (hotplug_cfd+0x98/0xd8)
 [1.105269] [c02a3a94] (hotplug_cfd) from [c026b064]
 (notifier_call_chain+0x44/0x84)
 [1.105285] [c026b064] (notifier_call_chain) from [c024c1a4]
 (_cpu_up+0x120/0x170)
 [1.105302] [c024c1a4] (_cpu_up) from [c024c264] (cpu_up+0x70/0x94)
 [1.105319] [c024c264] (cpu_up) from [c0b5839c] (smp_init+0xac/0xb0)
 [1.105337] [c0b5839c] (smp_init) from [c0b2fc54]
 (kernel_init_freeable+0x90/0x1dc)
 [1.105353] [c0b2fc54] (kernel_init_freeable) from [c0851248]
 (kernel_init+0xc/0xe8)
 [1.105368] [c0851248] (kernel_init) from [c020e7f8]
 (ret_from_fork+0x14/0x3c)
 [1.105389] ---[ end trace bc66942e4ab63168 ]---

 Argh! I had put the switch-case handling for CPU_DYING at the 'wrong' place,
 since I hadn't noticed that CPU_UP_CANCELED silently falls-through to 
 CPU_DEAD.
 This is what happens when people don't explicitly write fall-through in the
 comments in a switch-case statement :-(

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).

I am unable to apply your below patch on top of the above 2 reverts.
Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline
fatal: corrupt patch at line 106
Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
callbacks before CPU offline

Even with 'patch' I get the below failures:
patching file kernel/smp.c
Hunk #2 FAILED at 53.
Hunk #3 FAILED at 179.
2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej

Regards,
Sachin.
--
To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 03:03 PM, Sachin Kamat wrote:

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).
 
 I am unable to apply your below patch on top of the above 2 reverts.
 Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
 offline
 fatal: corrupt patch at line 106
 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
 callbacks before CPU offline
 
 Even with 'patch' I get the below failures:
 patching file kernel/smp.c
 Hunk #2 FAILED at 53.
 Hunk #3 FAILED at 179.
 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej
 

Hmm, weird. My mailer must have screwed it up.

Let's try again:

[In case this also doesn't work for you, please use this git tree in which
 I have reverted the 2 old commits and added this updated patch.

 git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
]


From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

[PATCH] CPU hotplug, smp: Execute any pending IPI callbacks before CPU offline

There is a race between the CPU offline code (within stop-machine) and the
smp-call-function code, which can lead to getting IPIs on the outgoing CPU,
*after* it has gone offline.

Specifically, this can happen when using smp_call_function_single_async() to
send the IPI, since this API allows sending asynchronous IPIs from IRQ
disabled contexts. The exact race condition is described below.

During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the
other CPUs disable their local interrupts. Due to this, we can encounter a
situation in which an IPI is sent by one of the other CPUs to the outgoing CPU
(while it is *still* online), but the outgoing CPU ends up noticing it only
*after* it has gone offline.

  CPU 1 CPU 2
  (Online CPU)   (CPU going offline)

   Enter _PREPARE stage  Enter _PREPARE stage

 Enter _DISABLE_IRQ stage

   =
   Got a device interrupt, and | Didn't notice the IPI
   the interrupt handler sent an   | since interrupts were
   IPI to CPU 2 using  | disabled on this CPU.
   smp_call_function_single_async()|
   =

   Enter _DISABLE_IRQ stage

   Enter _RUN stage  Enter _RUN stage

  =
   Busy loop with interrupts  |  Invoke take_cpu_down()
   disabled.  |  and take CPU 2 offline
  =

   Enter _EXIT stage Enter _EXIT stage

   Re-enable interrupts  Re-enable interrupts

 The pending IPI is noted
 immediately, but alas,
 the CPU is offline at
 this point.

This of course, makes the smp-call-function IPI handler code running on CPU 2
unhappy and it complains about receiving an IPI on an offline CPU.
One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:
__blk_complete_request() [interrupt-handler]
raise_blk_irq()
smp_call_function_single_async()


However, if we look closely, the block layer does check that the target CPU is
online before firing the IPI. So in this case, it is actually the unfortunate
ordering/timing of events in the stop-machine phase that leads to receiving
IPIs after the target CPU has gone offline.

In reality, getting a late IPI on an offline CPU is not too bad by itself
(this can happen even due to hardware latencies in IPI send-receive). It is
a bug only if the target CPU really went offline without executing all the
callbacks queued on its list. (Note that a CPU is free to execute its pending
smp-call-function callbacks in a batch, without waiting for the corresponding
IPIs to arrive for each one of those callbacks).

So, fixing this issue can be broken up into two parts:
1. Ensure that a CPU goes offline only after executing all the callbacks
   queued on it.
2. Modify the warning condition in the smp-call-function IPI handler code
   such that it warns only if an offline CPU got an IPI *and* that CPU had
   gone offline with callbacks still pending in its queue.

Achieving part 1 is straight-forward - just flush (execute) all the queued
callbacks on the outgoing CPU in the CPU_DYING stage[1], including those
callbacks for 

Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Sachin Kamat
Hi Srivatsa,

On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat
srivatsa.b...@linux.vnet.ibm.com wrote:
 On 06/17/2014 03:03 PM, Sachin Kamat wrote:

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).

 I am unable to apply your below patch on top of the above 2 reverts.
 Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
 offline
 fatal: corrupt patch at line 106
 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
 callbacks before CPU offline

 Even with 'patch' I get the below failures:
 patching file kernel/smp.c
 Hunk #2 FAILED at 53.
 Hunk #3 FAILED at 179.
 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej


 Hmm, weird. My mailer must have screwed it up.

 Let's try again:

 [In case this also doesn't work for you, please use this git tree in which
  I have reverted the 2 old commits and added this updated patch.

  git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
 ]

Unfortunately the attached patch did not apply either. Nevertheless, I
applied the
patch from your above mentioned tree. With that patch I do not see the warnings
that I mentioned in my first mail. Thanks for fixing it.

Regards,
Sachin.
--
To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot warnings on exynos5420 based boards

2014-06-17 Thread Srivatsa S. Bhat
On 06/17/2014 05:25 PM, Sachin Kamat wrote:
 Hi Srivatsa,
 
 On Tue, Jun 17, 2014 at 3:24 PM, Srivatsa S. Bhat
 srivatsa.b...@linux.vnet.ibm.com wrote:
 On 06/17/2014 03:03 PM, Sachin Kamat wrote:

 Below is an updated patch, please let me know how it goes. (You'll have to
 revert c47a9d7cca first, and then 56e692182, before trying this patch).

 I am unable to apply your below patch on top of the above 2 reverts.
 Applying: CPU hotplug, smp: Execute any pending IPI callbacks before CPU 
 offline
 fatal: corrupt patch at line 106
 Patch failed at 0001 CPU hotplug, smp: Execute any pending IPI
 callbacks before CPU offline

 Even with 'patch' I get the below failures:
 patching file kernel/smp.c
 Hunk #2 FAILED at 53.
 Hunk #3 FAILED at 179.
 2 out of 3 hunks FAILED -- saving rejects to file kernel/smp.c.rej


 Hmm, weird. My mailer must have screwed it up.

 Let's try again:

 [In case this also doesn't work for you, please use this git tree in which
  I have reverted the 2 old commits and added this updated patch.

  git://github.com/srivatsabhat/linux.git ipi-offline-fix-v3
 ]
 
 Unfortunately the attached patch did not apply either. Nevertheless, I
 applied the
 patch from your above mentioned tree. With that patch I do not see the 
 warnings
 that I mentioned in my first mail. Thanks for fixing it.
 

Sure, thanks for reporting the bug and testing the updated patch!
By the way, I think there is some problem in the workflow that you use to
copy-paste/apply the patch. I tried applying both patches (that I sent in
2 different mails) and both applied properly without any problems.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-samsung-soc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html