On Thu, 1 Sep 2016, Viresh Kumar wrote:
> On 31-08-16, 12:46, Alan Stern wrote:
> > On Wed, 31 Aug 2016, Viresh Kumar wrote:
> >
> > > On 05-08-16, 11:51, Alan Stern wrote:
> > > > +++ usb-4.x/drivers/usb/core/hub.c
> > > > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
> > > >
> > > > /* Continue a partial initialization */
> > > > if (type == HUB_INIT2 || type == HUB_INIT3) {
> > > > - device_lock(hub->intfdev);
> > > > + device_lock(&hdev->dev);
> > >
> > > Hi Alan,
> > >
> > > I have received reports of kernel crashes (NULL dereference) due to this
> > > patch
> > > in some of the corner cases. Note that we have backported this patch (and
> > > few
> > > other) to 3.10 kernel. I have attached my hub.c file as well for
> > > reference.
> > >
> > > Here is the reported kernel OOPs:
> > >
> > > [ 19.476228] Unable to handle kernel NULL pointer dereference at
> > > virtual address 00000000
> > > [ 19.476231] pgd = ffffffc00007d000
> > > [ 19.476236] [00000000] *pgd=000000000e90b003, *pmd=000000000e90c003,
> > > *pte=00e00000f9000407
> > > [ 19.476242] Internal error: Oops: 96000045 [#1] PREEMPT SMP
> > > [ 19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O)
> > > gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
> > > [ 19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G O
> > > 3.10.97-g4b7224f-dirty #454
> > > [ 19.476290] Workqueue: events hub_init_func2
> > > [ 19.476293] task: ffffffc09b3560c0 ti: ffffffc09ada8000 task.ti:
> > > ffffffc09ada8000
> > > [ 19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
> > > [ 19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
> > ...
> > > [ 19.476582] Call trace:
> > > [ 19.476586] [<ffffffc000ccf13c>] __mutex_lock_slowpath+0x138/0x224
> > > [ 19.476590] [<ffffffc000ccf254>] mutex_lock+0x2c/0x48
> > > [ 19.476593] [<ffffffc0006f6eac>] hub_activate+0x50/0x4d8
> > > [ 19.476596] [<ffffffc0006f7388>] hub_init_func2+0x14/0x1c
> > > [ 19.476602] [<ffffffc0002387ac>] process_one_work+0x26c/0x3cc
> > > [ 19.476605] [<ffffffc000239988>] worker_thread+0x208/0x358
> > > [ 19.476610] [<ffffffc00023f360>] kthread+0xbc/0xc4
> > ...
> > > This happens when the device is infinitely generating connected and then
> > > removed
> > > (not manually, but due to some hardware issues).
> >
> > If I'm reading this right, it means that hub->hdev is NULL in
>
> I am not sure I am in sync here :(
>
> > hub_activate().
>
> We would have gotten the crash right from hub_activate() in that case, isn't
> it?
>
> The fact that the call sequence reached mutex_lock() here, it means that
> hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted
> or
> uninitialized, etc..
Ah, that makes a lot more sense. Thanks for staightening me out. And
some pointer embedded in the mutex must have been NULL.
> And that's where it all went wrong. As
> __mutex_lock_slowpath() is called, it means that the mutex had a count of 0
> instead of 1 during the lock and then we crashed during
> __mutex_lock_slowpath(),
> which can only happen if the mutex is uninitialized in the first place.
>
> The mutex gets initialized as part of device_add() and so things can go wrong
> if
> device_add() was skipped here somehow.
>
> I may be completely wrong, but that's what I read :)
Okay. But I don't see how device_add() could have been skipped. The
hub_init_func2() call occurs after the original HUB_INIT hub_activate()
call, which is in hub_configure() and occurs during probing. The hub
interface doesn't get probed until the hub device is registered (the
hub interface is a child of the hub device).
Another possibility is that the mutex _was_ initialized but got
corrupted somehow. Of course, that kind of thing is very hard to track
down.
On Thu, 1 Sep 2016, Vaibhav Hiremath wrote:
> I have some more update on this,
>
> It seems the culprit was my laptop USB port (I have to say bad port),
> which resulted into continuous connect/disconnect event, on both
> laptop and phone (Android), which eventually resulted into kernel panic.
Well, the bad port was the trigger. But even with a bad port, the
kernel should not panic.
> It could be a memory corruption or race somewhere (not sure though),
> which is where device is not initialized properly when execution reached
> to hub_activate().
>
> I connected phone to another USB port and things started working as
> expected.
>
> I can put prints to trace the execution flow, lets see what comes out
> from it.
How easily can you reproduce the problem?
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html