Due to some OT reasons I'm compiling LEDE kernel (4.9.31) with several debug checks enabled, and I'm using it on a Lantiq xrx200 board (fritzbox 3370).
I've hit a bug in net/phy/swconfig.c driver: in swconfig_get_dev() we attempt to take dev->sw_mutex with a spin_lock held (that's taken in swconfig_lock() function). AFAIK this can cause deadlocks, and it does cause the kernel to complain [0]. I'm not familiar with this driver code, neither with generic netlink, however I have noticed that: 1- there is a "parallel_ops" flag in struct genl_family. I'm not sure, but it seems it can be used to make sure that calls to genl_ops callbacks are serialized from the network core; thus, is really in-driver locking required? 2- looking at net core code that implements the lock above[1], they seems using a regular mutex (not a spinlock), so I guess that genl_ops callbacks are not called in atomic context. Why do we need a spinlock in swconfig.c driver? I couldn't spot any interrupt, neither any tasklet, in the driver code that could be racy wrt the genl_ops (and I would expect spinlock_irqsave or spinlock_bh to be used in geln_ops funcs in those cases); also, after a overview of external users of exported functions (a bunch of phy drivers), I couldn't find any usage that justifies spinlock usage.. am I missing something? 3- the two said locks are also both taken in reverse order in unregister_switch(). Couldn't the reversed order cause another deadlock? If you could provide me some clues about how locking is designed in swconfig driver (for what the two locks are for), then I could try to fix the issue by myself and post a patch. Thanks Andrea [0] [ 11.671567] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ 11.678696] in_atomic(): 1, irqs_disabled(): 0, pid: 711, name: swconfig [ 11.685376] 3 locks held by swconfig/711: [ 11.689384] #0: (cb_lock){......}, at: [<80453a34>] genl_rcv+0x20/0x48 [ 11.696054] #1: (genl_mutex){......}, at: [<80453aac>] genl_rcv_msg+0x50/0x364 [ 11.703444] #2: (swdevs_lock){......}, at: [<8039e48c>] swconfig_get_dev.isra.6+0x2c/0xd0 [ 11.711816] CPU: 1 PID: 711 Comm: swconfig Not tainted 4.9.31 #0 [ 11.717739] Stack : 00000000 00000000 80e6c56a 00000034 80573bb4 00000000 00000000 807a0000 [ 11.726084] 87e1a7ec 80795887 806f38bc 00000001 000002c7 809041fc 00000000 00000000 [ 11.734439] 00000000 8007da38 00000000 00000000 00000000 8007da38 806faf04 874d1aec [ 11.742795] 00000002 800be57c 806f97e4 874d1afc 00000000 80790000 00000000 874d1a00 [ 11.751151] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 11.759507] ... [ 11.761946] Call Trace: [ 11.764400] [<80010b38>] show_stack+0x50/0x84 [ 11.768788] [<802c0374>] dump_stack+0xd4/0x110 [ 11.773200] [<800569b0>] ___might_sleep+0xfc/0x11c [ 11.778011] [<80556058>] mutex_lock_nested+0x4c/0x454 [ 11.783050] [<8039e4d0>] swconfig_get_dev.isra.6+0x70/0xd0 [ 11.788520] [<8039e56c>] swconfig_list_attrs+0x3c/0x230 [ 11.793751] [<80453d10>] genl_rcv_msg+0x2b4/0x364 [ 11.798449] [<80452db4>] netlink_rcv_skb+0x7c/0xf8 [ 11.803228] [<80453a44>] genl_rcv+0x30/0x48 [ 11.807415] [<804524d0>] netlink_unicast+0x190/0x2c4 [ 11.812367] [<80452b54>] netlink_sendmsg+0x418/0x44c [ 11.817356] [<803ff4ac>] sock_sendmsg+0x18/0x30 [ 11.821864] [<80400948>] ___sys_sendmsg+0x1bc/0x25c [ 11.826731] [<804018a8>] __sys_sendmsg+0x4c/0x80 [ 11.831353] [<8001ad38>] syscall_common+0x34/0x58 [1] http://elixir.free-electrons.com/linux/latest/source/net/netlink/genetlink.c#L30 http://elixir.free-electrons.com/linux/latest/source/net/netlink/genetlink.c#L610 _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev