On Fri, 24 Nov 2017 14:04:35 +0000, Vincent Pelletier
<[email protected]> wrote:
> What seem to be the relevant pieces are:
> - at least one AIO transfers submitted for reading from EP2OUT
> - upon receiving data from stdin, a synchronous write happens on EP2IN,
> which blocks if host did not submit a transfer (normal)
> - SIGQUIT to interrupt the write while it's blocking
I could simplify the issue requirements: it is not required to use
synchronous IO. Submitting AIO and killing functionfs-interfaced
process is sufficient. This time, I kill with signal 15, which makes
the process exit without (in my understanding) returning control to
python (no sighandler set).
For some reason, I could now get kgdb to start with modified module
loaded, and inspect the status a bit more.
To my surprise, the error symptom do not seem to change:
(gdb) lx-dmesg
[...]
[ 382.505547] BUG: scheduling while atomic: swapper/1/0/0x00000100
[ 382.511767] 3 locks held by swapper/1/0:
[ 382.515903] #0: (rcu_callback){....}, at: [<c10b4ff0>]
rcu_process_callbacks+0x260/0x440
[ 382.524572] #1: (rcu_read_lock_sched){....}, at: [<c1358ba0>]
percpu_ref_switch_to_atomic_rcu+0xb0/0x130
[ 382.534650] #2: (&(&ctx->ctx_lock)->rlock){....}, at: [<c11f0c73>]
free_ioctx_users+0x23/0xd0
[ 382.534699] Modules linked in: usb_f_fs libcomposite configfs bnep btsdio
bluetooth ecdh_generic brcmfmac brcmutil dwc3 ulpi intel_powerclamp udc_core
coretemp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel pcbc dwc3_pci
aesni_intel aes_i586 crypto_simd cryptd ehci_pci ehci_hcd basincove_gpadc
gpio_keys industrialio usbcore usb_common
[ 382.534945] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-edison+ #117
[ 382.534955] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542
2015.01.21:18.19.48
[ 382.534963] Call Trace:
[ 382.534976] <SOFTIRQ>
[ 382.535004] dump_stack+0x47/0x62
[ 382.535034] __schedule_bug+0x61/0x80
[ 382.535057] __schedule+0x43/0x7a0
[ 382.535116] schedule+0x5f/0x70
[ 382.535155] dwc3_gadget_ep_dequeue+0x14c/0x270 [dwc3]
[ 382.535196] ? do_wait_intr_irq+0x70/0x70
[ 382.535248] usb_ep_dequeue+0x19/0x90 [udc_core]
[ 382.535292] ffs_aio_cancel+0x22/0x30 [usb_f_fs]
[ 382.535312] kiocb_cancel+0x31/0x40
[ 382.535333] free_ioctx_users+0x4d/0xd0
[ 382.535367] percpu_ref_switch_to_atomic_rcu+0x10a/0x130
[ 382.535389] ? percpu_ref_exit+0x40/0x40
[ 382.535410] rcu_process_callbacks+0x2b3/0x440
[ 382.535460] ? __softirqentry_text_start+0x8/0x8
[ 382.535476] __do_softirq+0xf8/0x26b
[ 382.535517] ? __softirqentry_text_start+0x8/0x8
[ 382.535538] do_softirq_own_stack+0x22/0x30
[ 382.535547] </SOFTIRQ>
[ 382.535573] irq_exit+0x45/0xc0
[ 382.535591] smp_apic_timer_interrupt+0x13c/0x150
[ 382.535618] apic_timer_interrupt+0x35/0x3c
[ 382.535637] EIP: cpuidle_enter_state+0x26b/0x330
[ 382.535647] EFLAGS: 00200202 CPU: 1
[ 382.535657] EAX: 00000000 EBX: c1bfd680 ECX: 00000002 EDX: 00000000
[ 382.535668] ESI: 00016e40 EDI: 00000059 EBP: f6911f60 ESP: f6911f2c
[ 382.535679] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 382.535792] cpuidle_enter+0x14/0x20
[ 382.535815] call_cpuidle+0x35/0x40
[ 382.535836] do_idle+0x114/0x180
[ 382.535862] cpu_startup_entry+0x25/0x30
[ 382.535884] start_secondary+0x191/0x1a0
[ 382.535915] startup_32_smp+0x164/0x166
[ 382.536327] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 382.536351] IP: 0xf6909f43
[ 382.536357] *pde = 00000000
[ 382.536372] Oops: 0000 [#1] SMP
[ 393.811964] WARNING: stack recursion on stack type 3
I tried to confirm I'm running the modified kernel:
(gdb) disassemble ffs_aio_cancel
Dump of assembler code for function ffs_aio_cancel:
0xf819c6f0 <+0>: lea %ds:0x0(%esi,%eiz,1),%esi
0xf819c6f5 <+5>: mov 0x10(%eax),%edx
0xf819c6f8 <+8>: test %edx,%edx
0xf819c6fa <+10>: je 0xf819c718 <ffs_aio_cancel+40>
0xf819c6fc <+12>: mov 0x4c(%edx),%eax
0xf819c6ff <+15>: test %eax,%eax
0xf819c701 <+17>: je 0xf819c718 <ffs_aio_cancel+40>
0xf819c703 <+19>: mov 0x50(%edx),%edx
0xf819c706 <+22>: test %edx,%edx
0xf819c708 <+24>: je 0xf819c718 <ffs_aio_cancel+40>
0xf819c70a <+26>: push %ebp
0xf819c70b <+27>: mov %esp,%ebp
0xf819c70d <+29>: call 0xf80ba220 <usb_ep_dequeue>
0xf819c712 <+34>: pop %ebp
0xf819c713 <+35>: ret
0xf819c714 <+36>: lea 0x0(%esi,%eiz,1),%esi
0xf819c718 <+40>: mov $0xffffffea,%eax
0xf819c71d <+45>: ret
End of assembler dump.
Although I do not speak x86 fluently, I would expect several more
instructions or calls if IRQs were being disabled, so I believe I am
running the modified version.
I was surprised to see a high-pid process about dwc3:
(gdb) info threads
[...]
105 Thread 1852 (htop) 0x00000000 in ?? () <- from where I killed python
(signal 15)
106 Thread 1879 (strace) 0x00000000 in ?? () <- strace python [...]
107 Thread 1881 (python) 0x00000000 in ?? () <- python, interfacing with
functionfs
108 Thread 1882 (irq/34-dwc3) 0x00000000 in ?? ()
So I grabbed the backtraces of both the dwc3 thread and python:
(gdb) t 108
[Switching to thread 108 (Thread 1882)]
#0 0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0xf41cd340 in ?? ()
#2 0xc18474af in schedule () at kernel/sched/core.c:3425
#3 0xc10a99c5 in irq_wait_for_interrupt (action=<optimized out>) at
kernel/irq/manage.c:768
#4 irq_thread (data=0x0) at kernel/irq/manage.c:972
#5 0xc106d5e1 in kthread (_create=0xf4f52c00) at kernel/kthread.c:231
#6 0xc184cc41 in ret_from_fork () at arch/x86/entry/entry_32.S:293
Normal IRQ handler, I guess. Maybe it only has a higher pid because
gadget became necessarily active after python script opened endpoints,
and handler would only be forked when it becomes needed ?
(gdb) t 107
[Switching to thread 107 (Thread 1881)]
#0 0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0xf568d600 in ?? ()
#2 0xc18474af in schedule () at kernel/sched/core.c:3425
#3 0xc184b5ca in schedule_timeout (timeout=<optimized out>) at
kernel/time/timer.c:1720
#4 0xc184848b in do_wait_for_common (state=<optimized out>,
timeout=2147483647, action=<optimized out>, x=<optimized out>) at
kernel/sched/completion.c:91
#5 __wait_for_common (state=<optimized out>, timeout=<optimized out>,
action=<optimized out>, x=<optimized out>) at kernel/sched/completion.c:112
#6 wait_for_common (state=<optimized out>, timeout=<optimized out>,
x=<optimized out>) at kernel/sched/completion.c:123
#7 wait_for_completion (x=0xf4e25df8) at kernel/sched/completion.c:144
#8 0xc11f0e8a in exit_aio (mm=0xf568d600) at fs/aio.c:903
#9 0xc104e8a8 in __mmput (mm=<optimized out>) at kernel/fork.c:925
#10 0xc1055da2 in exit_mm () at kernel/exit.c:544
#11 do_exit (code=<optimized out>) at kernel/exit.c:852
#12 0xc10564c7 in do_group_exit (exit_code=15) at kernel/exit.c:968
#13 0xc1060463 in get_signal (ksig=<optimized out>) at kernel/signal.c:2334
#14 0xc101888e in do_signal (regs=0xf4e25fb4) at arch/x86/kernel/signal.c:809
#15 0xc1000fd3 in exit_to_usermode_loop (regs=0xf4e25fb4, cached_flags=7) at
arch/x86/entry/common.c:158
#16 0xc10018c1 in prepare_exit_to_usermode (regs=<optimized out>) at
arch/x86/entry/common.c:197
#17 syscall_return_slowpath (regs=<optimized out>) at
arch/x86/entry/common.c:266
#18 do_syscall_32_irqs_on (regs=<optimized out>) at arch/x86/entry/common.c:335
#19 do_fast_syscall_32 (regs=0xf4e25fb4) at arch/x86/entry/common.c:391
#20 0xc184ccb3 in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:444
#21 0x00000008 in ?? ()
#22 0x01cad558 in ?? ()
#23 0x000003ff in tlv_data_is_valid (hdev=<optimized out>,
adv_flags=4294967295, data=0x0, len=<error reading variable: Cannot access
memory at address 0xbf80543c>, is_adv_data=8)
at net/bluetooth/mgmt.c:6092
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I'll try to continue debugging tomorrow, but pointers are very welcome.
Regards,
--
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html