On Wed, Oct 03, 2018 at 01:20:44PM +0200, Håkon Bugge wrote:
> Hi Greg,
> 
> 
> I hope you will find this note appropriate.
> 
> The stable cherry-pick of upstream commit ebeeb1ad9b8a ("rds: tcp: use 
> rds_destroy_pending() to synchronize netns/module teardown and rds 
> connection/workq management") provokes the following stack trace when running 
> with debug:
> 
> 
> kernel: BUG: sleeping function called from invalid context at 
> kernel/locking/mutex.c:748
> kernel: =============================
> kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 4392, name: rds-stress
> kernel: 1 lock held by rds-stress/4392:
> kernel: #0: 00000000df837d5e
> kernel: WARNING: suspicious RCU usage
> kernel: 4.18.8 #1 Not tainted
> kernel: -----------------------------
> kernel: ./include/linux/rcupdate.h:303 Illegal context switch in RCU 
> read-side critical section!
> kernel: (
> kernel: #012other info that might help us debug this:
> kernel: #012rcu_scheduler_active = 2, debug_locks = 1
> kernel: rcu_read_lock){....}
> kernel: 1 lock held by rds-stress/4393:
> kernel: #0:
> kernel: , at: __rds_conn_create+0x604/0x960 [rds]
> kernel: 00000000df837d5e
> kernel: CPU: 38 PID: 4392 Comm: rds-stress Not tainted 4.18.8 #1
> kernel: Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO 
> TRAY,2U, BIOS 31110000 03/03/2017
> kernel: (rcu_read_lock
> kernel: Call Trace:
> kernel: ){....}
> kernel: dump_stack+0x81/0xb8
> kernel: , at: __rds_conn_create+0x604/0x960 [rds]
> kernel: #012stack backtrace:
> kernel: ___might_sleep+0x239/0x260
> kernel: __might_sleep+0x4a/0x80
> kernel: __mutex_lock+0x58/0x9c0
> kernel: ? __lock_acquire+0x47f/0x7e0
> kernel: ? pcpu_alloc+0x429/0x860
> kernel: ? find_held_lock+0x40/0xb0
> kernel: ? create_object+0x22f/0x320
> kernel: ? _raw_write_unlock_irqrestore+0x36/0x60
> kernel: mutex_lock_killable_nested+0x1b/0x20
> kernel: pcpu_alloc+0x429/0x860
> kernel: ? create_object+0x22f/0x320
> kernel: __alloc_percpu+0x15/0x20
> kernel: rds_ib_recv_alloc_cache+0x1c/0x80 [rds_rdma]
> kernel: rds_ib_recv_alloc_caches+0x1d/0x60 [rds_rdma]
> kernel: rds_ib_conn_alloc+0x46/0x170 [rds_rdma]
> kernel: __rds_conn_create+0x68d/0x960 [rds]
> kernel: ? __rds_conn_create+0x604/0x960 [rds]
> kernel: rds_conn_create_outgoing+0x14/0x20 [rds]
> kernel: rds_sendmsg+0x2e8/0xcd0 [rds]
> kernel: ? copy_msghdr_from_user+0xdb/0x140
> kernel: sock_sendmsg+0x38/0x50
> kernel: ___sys_sendmsg+0x27b/0x290
> kernel: ? __lock_acquire+0x47f/0x7e0
> kernel: ? find_held_lock+0x40/0xb0
> kernel: ? __audit_syscall_entry+0xdf/0x160
> kernel: ? ktime_get_coarse_real_ts64+0x6e/0xe0
> kernel: ? trace_hardirqs_on_caller+0x128/0x1b0
> kernel: ? trace_hardirqs_on+0xd/0x10
> kernel: ? __audit_syscall_entry+0xdf/0x160
> kernel: ? __audit_syscall_entry+0xdf/0x160
> kernel: __sys_sendmsg+0x5d/0xb0
> kernel: __x64_sys_sendmsg+0x1f/0x30
> kernel: do_syscall_64+0x5f/0x220
> kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Command line:
> 
> $ rds-stress -r <IB port 1 IP>& sleep 1; rds-stress -r <IB port 2 IP> -s <IB 
> port 1 IP> -T 10
> 
> Deliberately or accidently, Ka-Cheong's commit f394ad28feff ("rds: 
> rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead") fixes the 
> bug introduced by commit ebeeb1ad9b8a. Kudos to Zhu Yanjun who quickly 
> detected this.
> 
> But be aware, commit f394ad28feff does not contain the "Fixes:" tag.
> 
> Hence, I suggest that in all stable releases containing commit ebeeb1ad9b8a, 
> f394ad28feff must be included as well.

Great, thanks for the information.  Can you submit this info to the
netdev developers who will queue it up for a stable release?  Or, as
David is already on the cc: list here, he can just tell me to
cherry-pick it and I can do it on my own :)

thanks,

greg k-h

Reply via email to