On Thu, May 31, 2018 at 12:21:31PM +0300, Kirill Tkhai wrote: > Hi, Illyas, > > On 31.05.2018 11:43, Mansoor, Illyas wrote: > > We are facing mutex dead lock condition that we think might be related to a > > fix that you have provided in: > > Merge branch > > 'Close-race-between-un-register_netdevice_notifier-and-pernet_operations' > > commit b9a12601541eb55d07e00261a5112a4bc36fe7be > > > > We tried to backport the patch series, but got stuck due to dependencies > > not met in 4.9.102 kernel for these patch series. > > Could you please provide some pointers, so that we can fix in 4.9.y kernel. > > > > Appreciate any help or pointers on this one. > > > > Ipanic logs pasted below: > > > > <3>[ 6513.681473] INFO: task sensors@1.0-ser:2744 blocked for more than 120 > > seconds. > > <3>[ 6513.689723] Tainted: P U W O > > 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1 > > <3>[ 6513.699108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > <6>[ 6513.707997] sensors@1.0-ser D 0 2744 1 0x00000000 > > <4>[ 6513.708007] ffff880223f38040 ffff88027fc980c0 0000000000000000 > > ffff880271987000 > > <4>[ 6513.708024] ffff88026f9ae040 ffffc90000d57d40 ffffffff81b363d1 > > ffffffff81396e0b > > <4>[ 6513.708032] 00ffc90000d57d20 ffff88027fc980c0 ffffc90000d57d90 > > ffff88026f9ae040 > > <4>[ 6513.708040] Call Trace: > > <4>[ 6513.708056] [<ffffffff81b363d1>] ? __schedule+0x221/0x6e0 > > <4>[ 6513.708063] [<ffffffff81396e0b>] ? sidtab_context_to_sid+0x39b/0x410 > > <4>[ 6513.708068] [<ffffffff81b368c6>] schedule+0x36/0x90 > > <4>[ 6513.708072] [<ffffffff81b36d18>] schedule_preempt_disabled+0x18/0x30 > > <4>[ 6513.708078] [<ffffffff81b39a25>] __mutex_lock_slowpath+0x185/0x3f0 > > <4>[ 6513.708083] [<ffffffff81b39cb5>] mutex_lock+0x25/0x30 > > <4>[ 6513.708089] [<ffffffff81993fa5>] rtnl_lock+0x15/0x20 > > <4>[ 6513.708095] [<ffffffff8197d29d>] > > register_netdevice_notifier+0x2d/0x200 > > <4>[ 6513.708107] [<ffffffff81ad64db>] raw_init+0x8b/0x90 > > <4>[ 6513.708118] [<ffffffff81ad52e1>] can_create+0xe1/0x1c0 > > <4>[ 6513.708129] [<ffffffff819645fe>] __sock_create+0x12e/0x210 > > <4>[ 6513.708141] [<ffffffff81965fe5>] SyS_socket+0x55/0xb0 > > <4>[ 6513.708156] [<ffffffff81001fca>] do_syscall_64+0x6a/0xe0 > > <4>[ 6513.708166] [<ffffffff81b3dd20>] > > entry_SYSCALL_64_after_swapgs+0x5d/0xd7 > > <4>[ 6513.708171] NMI backtrace for cpu 2 > > <4>[ 6513.708178] CPU: 2 PID: 482 Comm: khungtaskd Tainted: P U W O > > 4.9.102-quilt-2e5dc0ac-07850-g222b9655589b #1 > > <4>[ 6513.708180] ffffc90000eafdd0 ffffffff813f56bc 0000000000000000 > > 0000000000000000 > > <4>[ 6513.708188] ffffc90000eafe00 ffffffff813f9fe1 0000000000000002 > > 0000000000000000 > > <4>[ 6513.708195] ffffffff81042d80 ffffffff826120f8 ffffc90000eafe30 > > ffffffff813fa0a3 > > 1)I'm not sure commit b9a12601541eb55d07e00261a5112a4bc36fe7be will help > here, because this > stack looks for me like just someone does not release the mutex. It's > possible firstly > try to analyze who actually owns it. > > 2)Also, note that rtnl_is_locked() is used in wrong way in one driver there > (see WILC_WFI_deinit_mon_interface()), so it also may introduce an imbalance > (if you use the driver). >
Thank you for your quick response. We will look into your suggestions and get back. Thanks, Pankaj > Kirill