I've recently encountered a problem after upgrading from 3.0.57-rt82 to
3.10.47-rt50 where process_backlog gets interrupted and does not resume
for a while, which results in packets not being processed in time. I see
net_rx_action, which then calls process_backlog (as the poll method to
process the backlog of packets queued up the netif_rx) but then after the
interruption, it does not finish for about 5ms. In the older kernel it
would finish based on the priority of ksoftirqd. This is no longer the
case.
I have priorities configured so that hard interrupts are highest,
ksoftirqd next (both are SCHED_FIFO) and then my program is currently
SCHED_OTHER but I still do not see the rx softirq finish before my program
runs.
This is all on a single core powerpc device. I do not see these problems
with a net device which uses NAPI directly (as such i'm updating my driver
to use NAPI) but it seems like there is a real bug here somewhere. I have
not been able to find any mention of similar problems (perhaps few people
are using netif_rx these days).
I've attached a recording from perf which shows the problem. Specifically,
you see net_rx_action run at time 213.079014 and then it doesn't finish
until about 5ms later at time 213.084953 which i not the case on the older
kernels. It seems something has changed with softirq handling or
process_backlog needed adapted for it. My suspicion is this has something
to do with the work mentioned in 210dc110063cf040d3209fddf766f6fcafccdc34
but I'm not an expert with this area of the kernel.
Any tips or suggestions would be appreciated. Please CC me as I am not
subscribed to lkml.
Thanks,
Trevor
ethernet_idle 3065 [000] 213.078439: printk:console: [ 213.078426] send
db
ethernet_idle 3065 [000] 213.078458: printk:console: [ 213.078449]
ether_device xmit
ethernet_idle 3065 [000] 213.078470: net:net_dev_queue: dev=eth2
skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.078492: printk:console: [ 213.078482] enet
xmit
ethernet_idle 3065 [000] 213.078524: irq:irq_handler_entry: irq=21
name=sel_enet_mac
ethernet_idle 3065 [000] 213.078532: irq:irq_handler_exit: irq=21
ret=handled
ethernet_idle 3065 [000] 213.078552: net:net_dev_xmit: dev=eth2
skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.078569: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==>
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.078603: printk:console: [ 213.078586] enet
rx
irq/21-sel_enet 3058 [000] 213.078629: net:netif_rx: dev=eth2 skbaddr=(nil)
len=46
irq/21-sel_enet 3058 [000] 213.078637: irq:softirq_raise: vec=3
[action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078655: printk:console: [ 213.078645] enet
tx
irq/21-sel_enet 3058 [000] 213.078673: irq:softirq_entry: vec=3
[action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078691: printk:console: [ 213.078680]
process_backlog entry
irq/21-sel_enet 3058 [000] 213.078702: net:netif_receive_skb: dev=eth2
skbaddr=(nil) len=46
irq/21-sel_enet 3058 [000] 213.078726: printk:console: [ 213.078711]
ether_device recv
irq/21-sel_enet 3058 [000] 213.078736: printk:console: [ 213.078716]
process_backlog exit 1
irq/21-sel_enet 3058 [000] 213.078747: napi:napi_poll: napi poll on napi
struct (nil) for device (no_device)
irq/21-sel_enet 3058 [000] 213.078754: irq:softirq_exit: vec=3
[action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078779: sched:sched_switch:
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==>
next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.078965: net:net_dev_queue: dev=lo
skbaddr=(nil) len=389
database 3224 [000] 213.078988: net:netif_rx: dev=lo skbaddr=(nil)
len=375
database 3224 [000] 213.078996: irq:softirq_raise: vec=3
[action=NET_RX]
database 3224 [000] 213.079005: net:net_dev_xmit: dev=lo
skbaddr=(nil) len=389 rc=0
database 3224 [000] 213.079014: irq:softirq_entry: vec=3
[action=NET_RX]
database 3224 [000] 213.079052: printk:console: [ 213.079026]
process_backlog entry
database 3224 [000] 213.079066: net:netif_receive_skb: dev=lo
skbaddr=(nil) len=375
database 3224 [000] 213.079152: net:net_dev_queue: dev=lo
skbaddr=(nil) len=66
database 3224 [000] 213.079204: irq:softirq_raise: vec=1
[action=TIMER]
database 3224 [000] 213.079263: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.079284: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.079302: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.079322: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat
next_pid=3197 next_prio=120
cat 3197 [000] 213.079489: sched:sched_switch: prev_comm=cat
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198
next_prio=120
grep 3198 [000] 213.079638: sched:sched_switch: prev_comm=grep
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle
next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.079672: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==>
next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.079694: net:netif_rx: dev=lo skbaddr=(nil)
len=52
database 3224 [000] 213.079702: irq:softirq_raise: vec=3
[action=NET_RX]
database 3224 [000] 213.079711: net:net_dev_xmit: dev=lo
skbaddr=(nil) len=66 rc=0
database 3224 [000] 213.079739: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.079770: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=db_ctl next_pid=3225
next_prio=120
db_ctl 3225 [000] 213.080213: irq:softirq_raise: vec=1
[action=TIMER]
db_ctl 3225 [000] 213.080270: sched:sched_switch: prev_comm=db_ctl
prev_pid=3225 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3
next_prio=2
ksoftirqd/0 3 [000] 213.080294: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.080312: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.080332: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==>
next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.080382: printk:console: [ 213.080353]
ether_device poll
ethernet_idle 3065 [000] 213.080443: printk:console: [ 213.080430] send
dd
ethernet_idle 3065 [000] 213.080464: printk:console: [ 213.080454]
ether_device xmit
ethernet_idle 3065 [000] 213.080476: net:net_dev_queue: dev=eth2
skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.080498: printk:console: [ 213.080487] enet
xmit
ethernet_idle 3065 [000] 213.080530: irq:irq_handler_entry: irq=21
name=sel_enet_mac
ethernet_idle 3065 [000] 213.080539: irq:irq_handler_exit: irq=21
ret=handled
ethernet_idle 3065 [000] 213.080560: net:net_dev_xmit: dev=eth2
skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.080577: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==>
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.080611: printk:console: [ 213.080595] enet
rx
irq/21-sel_enet 3058 [000] 213.080637: net:netif_rx: dev=eth2 skbaddr=(nil)
len=46
irq/21-sel_enet 3058 [000] 213.080656: printk:console: [ 213.080645] enet
tx
irq/21-sel_enet 3058 [000] 213.080687: sched:sched_switch:
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==>
next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.080714: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==>
next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.080747: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.080801: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=database next_pid=3224
next_prio=120
database 3224 [000] 213.080825: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.081210: irq:softirq_raise: vec=1
[action=TIMER]
dbpool 3217 [000] 213.081264: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3
next_prio=2
ksoftirqd/0 3 [000] 213.081286: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.081311: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.081331: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl
next_pid=3222 next_prio=120
dbcl 3222 [000] 213.081581: sched:sched_switch: prev_comm=dbcl
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=cat next_pid=3197
next_prio=120
cat 3197 [000] 213.081733: sched:sched_switch: prev_comm=cat
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198
next_prio=120
grep 3198 [000] 213.081853: sched:sched_switch: prev_comm=grep
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217
next_prio=120
dbpool 3217 [000] 213.082210: irq:softirq_raise: vec=1
[action=TIMER]
dbpool 3217 [000] 213.082264: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3
next_prio=2
ksoftirqd/0 3 [000] 213.082286: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.082297: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.082316: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl
next_pid=3222 next_prio=120
dbcl 3222 [000] 213.082595: sched:sched_switch: prev_comm=dbcl
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217
next_prio=120
dbpool 3217 [000] 213.082937: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=dbcl next_pid=3222
next_prio=120
dbcl 3222 [000] 213.083207: irq:softirq_raise: vec=1
[action=TIMER]
dbcl 3222 [000] 213.083260: sched:sched_switch: prev_comm=dbcl
prev_pid=3222 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3
next_prio=2
ksoftirqd/0 3 [000] 213.083281: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.083291: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.083310: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl
next_pid=3222 next_prio=120
dbcl 3222 [000] 213.083455: sched:sched_switch: prev_comm=dbcl
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217
next_prio=120
dbpool 3217 [000] 213.084211: irq:softirq_raise: vec=1
[action=TIMER]
dbpool 3217 [000] 213.084255: net:net_dev_queue: dev=lo
skbaddr=(nil) len=81
dbpool 3217 [000] 213.084274: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3
next_prio=2
ksoftirqd/0 3 [000] 213.084297: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.084306: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.084325: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbpool
next_pid=3217 next_prio=120
dbpool 3217 [000] 213.084345: net:netif_rx: dev=lo skbaddr=(nil)
len=67
dbpool 3217 [000] 213.084354: net:net_dev_xmit: dev=lo
skbaddr=(nil) len=81 rc=0
dbpool 3217 [000] 213.084495: sched:sched_switch: prev_comm=dbpool
prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle
next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.084558: printk:console: [ 213.084528]
ether_device poll
ethernet_idle 3065 [000] 213.084591: printk:console: [ 213.084581]
WARNING 0: 1 datagram timed out!
ethernet_idle 3065 [000] 213.084650: printk:console: [ 213.084639] send
df
ethernet_idle 3065 [000] 213.084668: printk:console: [ 213.084659]
ether_device xmit
ethernet_idle 3065 [000] 213.084680: net:net_dev_queue: dev=eth2
skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.084702: printk:console: [ 213.084692] enet
xmit
ethernet_idle 3065 [000] 213.084734: irq:irq_handler_entry: irq=21
name=sel_enet_mac
ethernet_idle 3065 [000] 213.084742: irq:irq_handler_exit: irq=21
ret=handled
ethernet_idle 3065 [000] 213.084763: net:net_dev_xmit: dev=eth2
skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.084780: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==>
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.084815: printk:console: [ 213.084798] enet
rx
irq/21-sel_enet 3058 [000] 213.084841: net:netif_rx: dev=eth2 skbaddr=(nil)
len=46
irq/21-sel_enet 3058 [000] 213.084860: printk:console: [ 213.084849] enet
tx
irq/21-sel_enet 3058 [000] 213.084892: sched:sched_switch:
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==>
next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.084925: printk:console: [ 213.084910]
process_backlog exit 1
database 3224 [000] 213.084936: napi:napi_poll: napi poll on napi
struct (nil) for device (no_device)
database 3224 [000] 213.084945: irq:softirq_raise: vec=3
[action=NET_RX]
database 3224 [000] 213.084953: irq:softirq_exit: vec=3
[action=NET_RX]
database 3224 [000] 213.084962: irq:softirq_entry: vec=3
[action=NET_RX]
database 3224 [000] 213.084979: printk:console: [ 213.084969]
process_backlog entry
database 3224 [000] 213.084990: net:netif_receive_skb: dev=lo
skbaddr=(nil) len=52
database 3224 [000] 213.085025: net:netif_receive_skb: dev=eth2
skbaddr=(nil) len=46
database 3224 [000] 213.085045: printk:console: [ 213.085032]
ether_device recv
database 3224 [000] 213.085057: net:netif_receive_skb: dev=lo
skbaddr=(nil) len=67
database 3224 [000] 213.085072: net:netif_receive_skb: dev=eth2
skbaddr=(nil) len=46
database 3224 [000] 213.085090: printk:console: [ 213.085078]
ether_device recv
database 3224 [000] 213.085109: printk:console: [ 213.085099]
process_backlog exit 1
database 3224 [000] 213.085118: napi:napi_poll: napi poll on napi
struct (nil) for device (no_device)
database 3224 [000] 213.085124: irq:softirq_exit: vec=3
[action=NET_RX]
database 3224 [000] 213.085199: irq:softirq_raise: vec=1
[action=TIMER]
database 3224 [000] 213.085247: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.085268: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.085284: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.085304: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat
next_pid=3197 next_prio=120
cat 3197 [000] 213.085488: sched:sched_switch: prev_comm=cat
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198
next_prio=120
grep 3198 [000] 213.085656: sched:sched_switch: prev_comm=grep
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle
next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.085689: sched:sched_switch:
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==>
next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.085812: net:net_dev_queue: dev=lo
skbaddr=(nil) len=66
database 3224 [000] 213.085835: net:netif_rx: dev=lo skbaddr=(nil)
len=52
database 3224 [000] 213.085843: irq:softirq_raise: vec=3
[action=NET_RX]
database 3224 [000] 213.085852: net:net_dev_xmit: dev=lo
skbaddr=(nil) len=66 rc=0
database 3224 [000] 213.085866: irq:softirq_entry: vec=3
[action=NET_RX]
database 3224 [000] 213.085905: printk:console: [ 213.085878]
process_backlog entry
database 3224 [000] 213.085920: net:netif_receive_skb: dev=lo
skbaddr=(nil) len=52
database 3224 [000] 213.086000: printk:console: [ 213.085979]
process_backlog exit 1
database 3224 [000] 213.086016: napi:napi_poll: napi poll on napi
struct (nil) for device (no_device)
database 3224 [000] 213.086024: irq:softirq_exit: vec=3
[action=NET_RX]
database 3224 [000] 213.086206: irq:softirq_raise: vec=1
[action=TIMER]
database 3224 [000] 213.086259: sched:sched_switch:
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==>
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.086281: irq:softirq_entry: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.086303: irq:softirq_exit: vec=1
[action=TIMER]
ksoftirqd/0 3 [000] 213.086323: sched:sched_switch:
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat
next_pid=3197 next_prio=120
cat 3197 [000] 213.086439: sched:sched_switch: prev_comm=cat
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198
next_prio=120
grep 3198 [000] 213.086539: sched:sched_switch: prev_comm=grep
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle
next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.086593: printk:console: [ 213.086565]
ether_device poll
ethernet_idle 3065 [000] 213.086622: printk:console: [ 213.086612]
unmatched dd