On 01/13/2017 01:53 PM, Grygorii Strashko wrote:
>
>
> On 01/13/2017 01:01 PM, Tony Lindgren wrote:
>> * Grygorii Strashko <[email protected]> [170113 10:37]:
>>>
>>>
>>> On 01/13/2017 12:01 PM, Tony Lindgren wrote:
>>>> Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
>>>> together with recent MUSB changes allowed USB and DMA on BeagleBone to idle
>>>> when no cable is connected. But looks like few corner case issues still
>>>> remain.
>>>>
>>>> Looks like just by re-plugging USB cable about ten or so times on
>>>> BeagleBone
>>>> when configured in USB peripheral mode we can get warnings and eventually
>>>> trigger an oops in cppi41 DMA:
>>>>
>>>> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154
>>>> cppi41_runtime_suspend+
>>>> x28/0x38 [cppi41]
>>>> ...
>>>>
>>>> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452
>>>> push_desc_queue+0x94/0x9c [cppi41]
>>>> ...
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual
>>>> address 00000104
>>>> pgd = c0004000
>>>> [00000104] *pgd=00000000
>>>> Internal error: Oops: 805 [#1] SMP ARM
>>>> ...
>>>> [<bf0d92cc>] (cppi41_runtime_resume [cppi41]) from [<c0589838>]
>>>> (__rpm_callback+0xc0/0x214)
>>>> [<c0589838>] (__rpm_callback) from [<c05899ac>] (rpm_callback+0x20/0x80)
>>>> [<c05899ac>] (rpm_callback) from [<c0589460>] (rpm_resume+0x504/0x78c)
>>>> [<c0589460>] (rpm_resume) from [<c058a1a0>] (pm_runtime_work+0x60/0xa8)
>>>> [<c058a1a0>] (pm_runtime_work) from [<c0156120>]
>>>> (process_one_work+0x2b4/0x808)
>>>>
>>>> This is because of a race with runtime PM and cppi41_dma_issue_pending()
>>>> as reported by Alexandre Bailon <[email protected]> in earlier
>>>> set of patches. Based on mailing list discussions we however came to the
>>>> conclusion that a different fix from Alexandre's fix is needed in order
>>>> to guarantee that DMA is really active when we try to use it.
>>>>
>>>> To fix the issue, we need to add a driver specific flag as we otherwise
>>>> can have -EINPROGRESS state set by runtime PM and can't rely on
>>>> pm_runtime_active() to tell us when we can use the DMA.
>>>>
>>>> And we need to make sure the DMA transfers get triggered in the queued
>>>> order. So let's always queue the transfers, then flush the queue
>>>> from both cppi41_dma_issue_pending() and cppi41_runtime_resume()
>>>> as suggested by Grygorii Strashko <[email protected]> in an
>>>> earlier example patch.
>>>>
>>>> For reference, this is also documented in
>>>> Documentation/power/runtime_pm.txt
>>>> in the example at the end of the file as pointed out by Grygorii Strashko
>>>> <[email protected]>.
>>>>
>>>> Based on earlier patches from Alexandre Bailon <[email protected]>
>>>> and Grygorii Strashko <[email protected]> modified based on
>>>> testing and what was discussed on the mailing lists.
>>>>
>>>> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
>>>> Cc: Andy Shevchenko <[email protected]>
>>>> Cc: Bin Liu <[email protected]>
>>>> Cc: Grygorii Strashko <[email protected]>
>>>> Cc: Kevin Hilman <[email protected]>
>>>> Cc: Patrick Titiano <[email protected]>
>>>> Cc: Sergei Shtylyov <[email protected]>
>>>> Reported-by: Alexandre Bailon <[email protected]>
>>>> Signed-off-by: Tony Lindgren <[email protected]>
>>>> ---
>>>>
>>>> Alexandre & Grygorii, can you guys please review and test?
>>>
>>> I've just tested it on BBB. And it triggers warning from IRQ
>>>
>>> root@am335x-evm:~# [ 242.280546] ------------[ cut here ]------------
>>> [ 242.285532] WARNING: CPU: 0 PID: 0 at drivers/dma/cppi41.c:330
>>> cppi41_irq+0x228/0x260 [cppi41]
>>> [ 242.335164] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W
>>> 4.10.0-rc3-00560-g51afc29 #133
>>> [ 242.344601] Hardware name: Generic AM33XX (Flattened Device Tree)
>>> [ 242.351084] [<c011013c>] (unwind_backtrace) from [<c010c300>]
>>> (show_stack+0x10/0x14)
>>> [ 242.359269] [<c010c300>] (show_stack) from [<c04a0278>]
>>> (dump_stack+0xac/0xe0)
>>> [ 242.366919] [<c04a0278>] (dump_stack) from [<c013700c>]
>>> (__warn+0xd8/0x104)
>>> [ 242.374282] [<c013700c>] (__warn) from [<c01370e4>]
>>> (warn_slowpath_null+0x20/0x28)
>>> [ 242.382299] [<c01370e4>] (warn_slowpath_null) from [<bf1d0384>]
>>> (cppi41_irq+0x228/0x260 [cppi41])
>>> [ 242.391714] [<bf1d0384>] (cppi41_irq [cppi41]) from [<c01a6888>]
>>> (__handle_irq_event_percpu+0x48/0x3b4)
>>> [ 242.401727] [<c01a6888>] (__handle_irq_event_percpu) from [<c01a6c10>]
>>> (handle_irq_event_percpu+0x1c/0x58)
>>> [ 242.412017] [<c01a6c10>] (handle_irq_event_percpu) from [<c01a6c84>]
>>> (handle_irq_event+0x38/0x5c)
>>> [ 242.421477] [<c01a6c84>] (handle_irq_event) from [<c01aa004>]
>>> (handle_level_irq+0xc0/0x154)
>>> [ 242.430378] [<c01aa004>] (handle_level_irq) from [<c01a5b9c>]
>>> (generic_handle_irq+0x20/0x34)
>>> [ 242.439378] [<c01a5b9c>] (generic_handle_irq) from [<c01a60f0>]
>>> (__handle_domain_irq+0x64/0xe0)
>>> [ 242.448660] [<c01a60f0>] (__handle_domain_irq) from [<c07f92f0>]
>>> (__irq_svc+0x70/0x98)
>>> [ 242.457117] [<c07f92f0>] (__irq_svc) from [<c010836c>]
>>> (arch_cpu_idle+0x20/0x3c)
>>> [ 242.464935] [<c010836c>] (arch_cpu_idle) from [<c018bf7c>]
>>> (do_idle+0x164/0x218)
>>> [ 242.472748] [<c018bf7c>] (do_idle) from [<c018c39c>]
>>> (cpu_startup_entry+0x18/0x1c)
>>> [ 242.480758] [<c018c39c>] (cpu_startup_entry) from [<c0b00c30>]
>>> (start_kernel+0x344/0x3bc)
>>> [ 242.489376] ---[ end trace 12c5b6488c1e8c75 ]---
>>> [ 242.496525] usb 1-1.3: USB disconnect, device number 4
>>>
>>> test sequence:
>>> - plug usb hub + cd card reader
>>> - plug usb stick in hub
>>> 37 mount /dev/sdb /media
>>> 38 ls /media/
>>> 39 rm /media/data
>>> 40 time dd if=/dev/zero of=/media/data bs=128k count=100
>>> 41 umount /media
>>> - unplug USB stick - > Warning
>>>
>>> ^ The same sequence without Hub -- > no warnings
>>>
>>> When I plug-unplug USB stick from Hub I can reproduce it pretty often :(
>>> but it can be different issue :(
>>
>> Hmm interesting, I'll try it here too. I wonder if adding spin_lock_irqsave/
>> restore around the WARN_ON() in the interrupt handler is enough to make it
>> go away?
>>
>
> I think not :) Most probably the PM runtime autosuspend is too small for the
> USB hub case.
>
> I've applied below diff and can see that PM runtime suspend happens before
> IRQ is triggered and there is one pending descs always.
>
> Then instead of trying to adjust autosuspend delay I tried to
> maintain proper PM runtime counter:
> - push desc -> pm_runtime_get -> counter ++
> - irq -> pop desc -> pm_runtime_put_autosuspend -> counter--
> which means "Keep cppi active until as least on desc is in progress"/
>
> As result with above change I do not see any warning any more - i've tried
> two different hubs.
>
> I saw your 098de42 "dmaengine: cppi41: Fix unpaired pm runtime when only a
> USB hub is connected",
> but, to be honest, I did not get how is it possible to get unpaired get/put
> from your description:
> - descriptor pushed in to the Q should be returned back as part of normal
> operation
> or as part of cppi41_tear_down_chan. May be you've hit an issue with
> tear_down?
>
>
Simplified diff with fix on top of your patch:
diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
index ce37a1a..9e9403a 100644
--- a/drivers/dma/cppi41.c
+++ b/drivers/dma/cppi41.c
@@ -319,12 +319,6 @@ static irqreturn_t cppi41_irq(int irq, void *data)
while (val) {
u32 desc, len;
- int error;
-
- error = pm_runtime_get(cdd->ddev.dev);
- if ((error != -EINPROGRESS) && (error < 0))
- dev_err(cdd->ddev.dev, "%s pm runtime get:
%i\n",
- __func__, error);
/* This warning should never trigger */
WARN_ON(cdd->is_suspended);
@@ -500,7 +494,6 @@ static void cppi41_dma_issue_pending(struct dma_chan *chan)
spin_unlock_irqrestore(&cdd->lock, flags);
pm_runtime_mark_last_busy(cdd->ddev.dev);
- pm_runtime_put_autosuspend(cdd->ddev.dev);
}
static u32 get_host_pd0(u32 length)
--
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html