Re: [PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

2021-03-12 Thread Guenter Roeck
On 3/11/21 11:36 PM, Dominique MARTINET wrote:
> Hi,
> 
> Guenter Roeck wrote on Thu, Sep 17, 2020 at 07:16:17PM -0700:
>> On 9/17/20 5:47 AM, Laurent Pinchart wrote:
>>> On Wed, Sep 16, 2020 at 07:25:42PM -0700, Guenter Roeck wrote:
 Something seems to have gone wrong with v3 of this patch series.
 I am sure I sent it out, but I don't find it anywhere.
 Resending. Sorry for any duplicates.
>>>
>>> I haven't checked the mailing list, but I've found it in my inbox :-)
>>> I'm not forgetting about you, just been fairly busy recently. I still
>>> plan to try and provide an alternative implementation in the V4L2 core
>>> (in a form that I think should even be moved to the cdev core) that
>>> would fix this for all drivers.
>>>
>> Thanks for letting me know. As it turns out, this problem is responsible
>> for about 2% of all Chromebook crashes, so I'll probably not wait for
>> the series to be accepted upstream but apply it as-is to the various
>> ChromeOS kernel branches.
> 
> We have a customer who reported the same issue recently, has there been
> any development?
> 

Not that I know of. We applied the series to all Chrome OS kernel branches,
and it reliably fixes the problem for us. We'd like to have the problem
fixed upstream; until that happens we'll have to carry the series forward.

> I don't see anything in either uvc nor v4l2 that would address the race
> since this mail half a year ago (well, I could have missed it ;))
> 

The problem still exists in the upstream kernel.

Guenter


Re: [PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

2021-03-11 Thread Dominique MARTINET
Hi,

Guenter Roeck wrote on Thu, Sep 17, 2020 at 07:16:17PM -0700:
> On 9/17/20 5:47 AM, Laurent Pinchart wrote:
> > On Wed, Sep 16, 2020 at 07:25:42PM -0700, Guenter Roeck wrote:
> >> Something seems to have gone wrong with v3 of this patch series.
> >> I am sure I sent it out, but I don't find it anywhere.
> >> Resending. Sorry for any duplicates.
> > 
> > I haven't checked the mailing list, but I've found it in my inbox :-)
> > I'm not forgetting about you, just been fairly busy recently. I still
> > plan to try and provide an alternative implementation in the V4L2 core
> > (in a form that I think should even be moved to the cdev core) that
> > would fix this for all drivers.
> > 
> Thanks for letting me know. As it turns out, this problem is responsible
> for about 2% of all Chromebook crashes, so I'll probably not wait for
> the series to be accepted upstream but apply it as-is to the various
> ChromeOS kernel branches.

We have a customer who reported the same issue recently, has there been
any development?

I don't see anything in either uvc nor v4l2 that would address the race
since this mail half a year ago (well, I could have missed it ;))


If nothing happened I'll probably backport this series as well, at which
point it might make more sense to take it in until a better fix gets
here then revert it...


Thanks!
-- 
Dominique


Re: [PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

2020-09-17 Thread Guenter Roeck
Hi Laurent,

On 9/17/20 5:47 AM, Laurent Pinchart wrote:
> Hi Guenter,
> 
> On Wed, Sep 16, 2020 at 07:25:42PM -0700, Guenter Roeck wrote:
>> Something seems to have gone wrong with v3 of this patch series.
>> I am sure I sent it out, but I don't find it anywhere.
>> Resending. Sorry for any duplicates.
> 
> I haven't checked the mailing list, but I've found it in my inbox :-)
> I'm not forgetting about you, just been fairly busy recently. I still
> plan to try and provide an alternative implementation in the V4L2 core
> (in a form that I think should even be moved to the cdev core) that
> would fix this for all drivers.
> 
Thanks for letting me know. As it turns out, this problem is responsible
for about 2% of all Chromebook crashes, so I'll probably not wait for
the series to be accepted upstream but apply it as-is to the various
ChromeOS kernel branches.

> By the way, as you managed to get hold of non-UVC webcams, one thing you
> could try in your tests to make the drivers misbehave is to block on a
> DQBUF call, and unplug the device at that time. When blocking, DQBUF
> releases the driver lock (through the vb2ops .wait_prepare() and
> .wait_finis() operations for drivers based on vb2), so this may allow
> unregistration to proceed without waiting for userspace calls to
> complete.
> 

Good idea. I'll give it a try.

Thanks,
Guenter

>> The uvcvideo code has no lock protection against USB disconnects
>> while video operations are ongoing. This has resulted in random
>> error reports, typically pointing to a crash in usb_ifnum_to_if(),
>> called from usb_hcd_alloc_bandwidth(). A typical traceback is as
>> follows.
>>
>> usb 1-4: USB disconnect, device number 3
>> BUG: unable to handle kernel NULL pointer dereference at 
>> PGD 0 P4D 0
>> Oops:  [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 
>> 4.19.113-08536-g5d29ca36db06 #1
>> Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
>> RIP: 0010:usb_ifnum_to_if+0x29/0x40
>> Code: <...>
>> RSP: 0018:a46f42a47a80 EFLAGS: 00010246
>> RAX:  RBX:  RCX: 904a396c9000
>> RDX: 904a39641320 RSI: 0001 RDI: 
>> RBP: a46f42a47a80 R08: 0002 R09: 
>> R10: 9975 R11: 0009 R12: 
>> R13: 904a396b3800 R14: 904a39e88000 R15: 
>> FS: 7f396448e700() GS:904a3ba0() knlGS:
>> CS: 0010 DS:  ES:  CR0: 80050033
>> CR2:  CR3: 00016cb46000 CR4: 001006f0
>> Call Trace:
>>  usb_hcd_alloc_bandwidth+0x1ee/0x30f
>>  usb_set_interface+0x1a3/0x2b7
>>  uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
>>  uvc_video_start_streaming+0x91/0xdd [uvcvideo]
>>  uvc_start_streaming+0x28/0x5d [uvcvideo]
>>  vb2_start_streaming+0x61/0x143 [videobuf2_common]
>>  vb2_core_streamon+0xf7/0x10f [videobuf2_common]
>>  uvc_queue_streamon+0x2e/0x41 [uvcvideo]
>>  uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
>>  __video_do_ioctl+0x33d/0x42a
>>  video_usercopy+0x34e/0x5ff
>>  ? video_ioctl2+0x16/0x16
>>  v4l2_ioctl+0x46/0x53
>>  do_vfs_ioctl+0x50a/0x76f
>>  ksys_ioctl+0x58/0x83
>>  __x64_sys_ioctl+0x1a/0x1e
>>  do_syscall_64+0x54/0xde
>>
>> While there are not many references to this problem on mailing lists, it is
>> reported on a regular basis on various Chromebooks (roughly 300 reports
>> per month). The problem is relatively easy to reproduce by adding msleep()
>> calls into the code.
>>
>> I tried to reproduce the problem with non-uvcvideo webcams, but was
>> unsuccessful. I was unable to get Philips (pwc) webcams to work. gspca
>> based webcams don't experience the problem, or at least I was unable to
>> reproduce it (The gspa driver does not trigger sending USB messages in the
>> open function, and otherwise uses the locking mechanism provided by the
>> v4l2/vb2 core).
>>
>> I don't presume to claim that I found every issue, but this patch series
>> should fix at least the major problems.
>>
>> The patch series was tested exensively on a Chromebook running chromeos-4.19
>> and on a Linux system running a v5.8.y based kernel.
>>
>> v3:
>> - In patch 5/5, add missing calls to usb_autopm_put_interface() and kfree()
>>   to failure code path
>>
>> v2:
>> - Added details about problem frequency and testing with non-uvc webcams
>>   to summary
>> - In patch 4/5, return EPOLLERR instead of -ENODEV on poll errors
>> - Fix description in patch 5/5
>>
>> 
>> Guenter Roeck (5):
>>   media: uvcvideo: Cancel async worker earlier
>>   media: uvcvideo: Lock video streams and queues while unregistering
>>   media: uvcvideo: Release stream queue when unregistering video device
>>   media: uvcvideo: Protect uvc queue file operations against disconnect
>>   media: uvcvideo: Abort uvc_v4l2_open if video device is unregistered
>>
>>  

Re: [PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

2020-09-17 Thread Laurent Pinchart
Hi Guenter,

On Wed, Sep 16, 2020 at 07:25:42PM -0700, Guenter Roeck wrote:
> Something seems to have gone wrong with v3 of this patch series.
> I am sure I sent it out, but I don't find it anywhere.
> Resending. Sorry for any duplicates.

I haven't checked the mailing list, but I've found it in my inbox :-)
I'm not forgetting about you, just been fairly busy recently. I still
plan to try and provide an alternative implementation in the V4L2 core
(in a form that I think should even be moved to the cdev core) that
would fix this for all drivers.

By the way, as you managed to get hold of non-UVC webcams, one thing you
could try in your tests to make the drivers misbehave is to block on a
DQBUF call, and unplug the device at that time. When blocking, DQBUF
releases the driver lock (through the vb2ops .wait_prepare() and
.wait_finis() operations for drivers based on vb2), so this may allow
unregistration to proceed without waiting for userspace calls to
complete.

> The uvcvideo code has no lock protection against USB disconnects
> while video operations are ongoing. This has resulted in random
> error reports, typically pointing to a crash in usb_ifnum_to_if(),
> called from usb_hcd_alloc_bandwidth(). A typical traceback is as
> follows.
> 
> usb 1-4: USB disconnect, device number 3
> BUG: unable to handle kernel NULL pointer dereference at 
> PGD 0 P4D 0
> Oops:  [#1] PREEMPT SMP PTI
> CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 
> 4.19.113-08536-g5d29ca36db06 #1
> Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
> RIP: 0010:usb_ifnum_to_if+0x29/0x40
> Code: <...>
> RSP: 0018:a46f42a47a80 EFLAGS: 00010246
> RAX:  RBX:  RCX: 904a396c9000
> RDX: 904a39641320 RSI: 0001 RDI: 
> RBP: a46f42a47a80 R08: 0002 R09: 
> R10: 9975 R11: 0009 R12: 
> R13: 904a396b3800 R14: 904a39e88000 R15: 
> FS: 7f396448e700() GS:904a3ba0() knlGS:
> CS: 0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 00016cb46000 CR4: 001006f0
> Call Trace:
>  usb_hcd_alloc_bandwidth+0x1ee/0x30f
>  usb_set_interface+0x1a3/0x2b7
>  uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
>  uvc_video_start_streaming+0x91/0xdd [uvcvideo]
>  uvc_start_streaming+0x28/0x5d [uvcvideo]
>  vb2_start_streaming+0x61/0x143 [videobuf2_common]
>  vb2_core_streamon+0xf7/0x10f [videobuf2_common]
>  uvc_queue_streamon+0x2e/0x41 [uvcvideo]
>  uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
>  __video_do_ioctl+0x33d/0x42a
>  video_usercopy+0x34e/0x5ff
>  ? video_ioctl2+0x16/0x16
>  v4l2_ioctl+0x46/0x53
>  do_vfs_ioctl+0x50a/0x76f
>  ksys_ioctl+0x58/0x83
>  __x64_sys_ioctl+0x1a/0x1e
>  do_syscall_64+0x54/0xde
> 
> While there are not many references to this problem on mailing lists, it is
> reported on a regular basis on various Chromebooks (roughly 300 reports
> per month). The problem is relatively easy to reproduce by adding msleep()
> calls into the code.
> 
> I tried to reproduce the problem with non-uvcvideo webcams, but was
> unsuccessful. I was unable to get Philips (pwc) webcams to work. gspca
> based webcams don't experience the problem, or at least I was unable to
> reproduce it (The gspa driver does not trigger sending USB messages in the
> open function, and otherwise uses the locking mechanism provided by the
> v4l2/vb2 core).
> 
> I don't presume to claim that I found every issue, but this patch series
> should fix at least the major problems.
> 
> The patch series was tested exensively on a Chromebook running chromeos-4.19
> and on a Linux system running a v5.8.y based kernel.
> 
> v3:
> - In patch 5/5, add missing calls to usb_autopm_put_interface() and kfree()
>   to failure code path
> 
> v2:
> - Added details about problem frequency and testing with non-uvc webcams
>   to summary
> - In patch 4/5, return EPOLLERR instead of -ENODEV on poll errors
> - Fix description in patch 5/5
> 
> 
> Guenter Roeck (5):
>   media: uvcvideo: Cancel async worker earlier
>   media: uvcvideo: Lock video streams and queues while unregistering
>   media: uvcvideo: Release stream queue when unregistering video device
>   media: uvcvideo: Protect uvc queue file operations against disconnect
>   media: uvcvideo: Abort uvc_v4l2_open if video device is unregistered
> 
>  drivers/media/usb/uvc/uvc_ctrl.c   | 11 ++
>  drivers/media/usb/uvc/uvc_driver.c | 12 ++
>  drivers/media/usb/uvc/uvc_queue.c  | 32 +--
>  drivers/media/usb/uvc/uvc_v4l2.c   | 45 
> --
>  drivers/media/usb/uvc/uvcvideo.h   |  1 +
>  5 files changed, 93 insertions(+), 8 deletions(-)

-- 
Regards,

Laurent Pinchart


[PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

2020-09-16 Thread Guenter Roeck
Something seems to have gone wrong with v3 of this patch series.
I am sure I sent it out, but I don't find it anywhere.
Resending. Sorry for any duplicates.

The uvcvideo code has no lock protection against USB disconnects
while video operations are ongoing. This has resulted in random
error reports, typically pointing to a crash in usb_ifnum_to_if(),
called from usb_hcd_alloc_bandwidth(). A typical traceback is as
follows.

usb 1-4: USB disconnect, device number 3
BUG: unable to handle kernel NULL pointer dereference at 
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 4.19.113-08536-g5d29ca36db06 
#1
Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
RIP: 0010:usb_ifnum_to_if+0x29/0x40
Code: <...>
RSP: 0018:a46f42a47a80 EFLAGS: 00010246
RAX:  RBX:  RCX: 904a396c9000
RDX: 904a39641320 RSI: 0001 RDI: 
RBP: a46f42a47a80 R08: 0002 R09: 
R10: 9975 R11: 0009 R12: 
R13: 904a396b3800 R14: 904a39e88000 R15: 
FS: 7f396448e700() GS:904a3ba0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 00016cb46000 CR4: 001006f0
Call Trace:
 usb_hcd_alloc_bandwidth+0x1ee/0x30f
 usb_set_interface+0x1a3/0x2b7
 uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
 uvc_video_start_streaming+0x91/0xdd [uvcvideo]
 uvc_start_streaming+0x28/0x5d [uvcvideo]
 vb2_start_streaming+0x61/0x143 [videobuf2_common]
 vb2_core_streamon+0xf7/0x10f [videobuf2_common]
 uvc_queue_streamon+0x2e/0x41 [uvcvideo]
 uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
 __video_do_ioctl+0x33d/0x42a
 video_usercopy+0x34e/0x5ff
 ? video_ioctl2+0x16/0x16
 v4l2_ioctl+0x46/0x53
 do_vfs_ioctl+0x50a/0x76f
 ksys_ioctl+0x58/0x83
 __x64_sys_ioctl+0x1a/0x1e
 do_syscall_64+0x54/0xde

While there are not many references to this problem on mailing lists, it is
reported on a regular basis on various Chromebooks (roughly 300 reports
per month). The problem is relatively easy to reproduce by adding msleep()
calls into the code.

I tried to reproduce the problem with non-uvcvideo webcams, but was
unsuccessful. I was unable to get Philips (pwc) webcams to work. gspca
based webcams don't experience the problem, or at least I was unable to
reproduce it (The gspa driver does not trigger sending USB messages in the
open function, and otherwise uses the locking mechanism provided by the
v4l2/vb2 core).

I don't presume to claim that I found every issue, but this patch series
should fix at least the major problems.

The patch series was tested exensively on a Chromebook running chromeos-4.19
and on a Linux system running a v5.8.y based kernel.

v3:
- In patch 5/5, add missing calls to usb_autopm_put_interface() and kfree()
  to failure code path

v2:
- Added details about problem frequency and testing with non-uvc webcams
  to summary
- In patch 4/5, return EPOLLERR instead of -ENODEV on poll errors
- Fix description in patch 5/5


Guenter Roeck (5):
  media: uvcvideo: Cancel async worker earlier
  media: uvcvideo: Lock video streams and queues while unregistering
  media: uvcvideo: Release stream queue when unregistering video device
  media: uvcvideo: Protect uvc queue file operations against disconnect
  media: uvcvideo: Abort uvc_v4l2_open if video device is unregistered

 drivers/media/usb/uvc/uvc_ctrl.c   | 11 ++
 drivers/media/usb/uvc/uvc_driver.c | 12 ++
 drivers/media/usb/uvc/uvc_queue.c  | 32 +--
 drivers/media/usb/uvc/uvc_v4l2.c   | 45 --
 drivers/media/usb/uvc/uvcvideo.h   |  1 +
 5 files changed, 93 insertions(+), 8 deletions(-)