I've recently encountered a problem after upgrading from 3.0.57-rt82 to 
3.10.47-rt50 where process_backlog gets interrupted and does not resume 
for a while, which results in packets not being processed in time.  I see 
net_rx_action, which then calls process_backlog (as the poll method to 
process the backlog of packets queued up the netif_rx) but then after the 
interruption, it does not finish for about 5ms.  In the older kernel it 
would finish based on the priority of ksoftirqd.  This is no longer the 
case.
 
I have priorities configured so that hard interrupts are highest, 
ksoftirqd next (both are SCHED_FIFO) and then my program is currently 
SCHED_OTHER but I still do not see the rx softirq finish before my program 
runs.

This is all on a single core powerpc device.  I do not see these problems 
with a net device which uses NAPI directly (as such i'm updating my driver 
to use NAPI) but it seems like there is a real bug here somewhere.  I have 
not been able to find any mention of similar problems (perhaps few people 
are using netif_rx these days).

I've attached a recording from perf which shows the problem. Specifically, 
you see net_rx_action run at time 213.079014 and then it doesn't finish 
until about 5ms later at time 213.084953 which i not the case on the older 
kernels.  It seems something has changed with softirq handling or 
process_backlog needed adapted for it.  My suspicion is this has something 
to do with the work mentioned in 210dc110063cf040d3209fddf766f6fcafccdc34 
but I'm not an expert with this area of the kernel.

Any tips or suggestions would be appreciated.  Please CC me as I am not 
subscribed to lkml.

Thanks,
Trevor



   ethernet_idle  3065 [000]   213.078439: printk:console: [  213.078426] send 
db
   ethernet_idle  3065 [000]   213.078458: printk:console: [  213.078449] 
ether_device xmit
   ethernet_idle  3065 [000]   213.078470: net:net_dev_queue: dev=eth2 
skbaddr=(nil) len=60
   ethernet_idle  3065 [000]   213.078492: printk:console: [  213.078482] enet 
xmit
   ethernet_idle  3065 [000]   213.078524: irq:irq_handler_entry: irq=21 
name=sel_enet_mac
   ethernet_idle  3065 [000]   213.078532: irq:irq_handler_exit: irq=21 
ret=handled
   ethernet_idle  3065 [000]   213.078552: net:net_dev_xmit: dev=eth2 
skbaddr=(nil) len=60 rc=0
   ethernet_idle  3065 [000]   213.078569: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> 
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
 irq/21-sel_enet  3058 [000]   213.078603: printk:console: [  213.078586] enet 
rx
 irq/21-sel_enet  3058 [000]   213.078629: net:netif_rx: dev=eth2 skbaddr=(nil) 
len=46
 irq/21-sel_enet  3058 [000]   213.078637: irq:softirq_raise: vec=3 
[action=NET_RX]
 irq/21-sel_enet  3058 [000]   213.078655: printk:console: [  213.078645] enet 
tx
 irq/21-sel_enet  3058 [000]   213.078673: irq:softirq_entry: vec=3 
[action=NET_RX]
 irq/21-sel_enet  3058 [000]   213.078691: printk:console: [  213.078680] 
process_backlog entry
 irq/21-sel_enet  3058 [000]   213.078702: net:netif_receive_skb: dev=eth2 
skbaddr=(nil) len=46
 irq/21-sel_enet  3058 [000]   213.078726: printk:console: [  213.078711] 
ether_device recv
 irq/21-sel_enet  3058 [000]   213.078736: printk:console: [  213.078716] 
process_backlog exit 1
 irq/21-sel_enet  3058 [000]   213.078747: napi:napi_poll: napi poll on napi 
struct (nil) for device (no_device)
 irq/21-sel_enet  3058 [000]   213.078754: irq:softirq_exit: vec=3 
[action=NET_RX]
 irq/21-sel_enet  3058 [000]   213.078779: sched:sched_switch: 
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> 
next_comm=database next_pid=3224 next_prio=120
        database  3224 [000]   213.078965: net:net_dev_queue: dev=lo 
skbaddr=(nil) len=389
        database  3224 [000]   213.078988: net:netif_rx: dev=lo skbaddr=(nil) 
len=375
        database  3224 [000]   213.078996: irq:softirq_raise: vec=3 
[action=NET_RX]
        database  3224 [000]   213.079005: net:net_dev_xmit: dev=lo 
skbaddr=(nil) len=389 rc=0
        database  3224 [000]   213.079014: irq:softirq_entry: vec=3 
[action=NET_RX]
        database  3224 [000]   213.079052: printk:console: [  213.079026] 
process_backlog entry
        database  3224 [000]   213.079066: net:netif_receive_skb: dev=lo 
skbaddr=(nil) len=375
        database  3224 [000]   213.079152: net:net_dev_queue: dev=lo 
skbaddr=(nil) len=66
        database  3224 [000]   213.079204: irq:softirq_raise: vec=1 
[action=TIMER]
        database  3224 [000]   213.079263: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
     ksoftirqd/0     3 [000]   213.079284: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.079302: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.079322: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat 
next_pid=3197 next_prio=120
             cat  3197 [000]   213.079489: sched:sched_switch: prev_comm=cat 
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 
next_prio=120
            grep  3198 [000]   213.079638: sched:sched_switch: prev_comm=grep 
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle 
next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.079672: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> 
next_comm=database next_pid=3224 next_prio=120
        database  3224 [000]   213.079694: net:netif_rx: dev=lo skbaddr=(nil) 
len=52
        database  3224 [000]   213.079702: irq:softirq_raise: vec=3 
[action=NET_RX]
        database  3224 [000]   213.079711: net:net_dev_xmit: dev=lo 
skbaddr=(nil) len=66 rc=0
        database  3224 [000]   213.079739: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=dbpool next_pid=3217 next_prio=120
          dbpool  3217 [000]   213.079770: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=db_ctl next_pid=3225 
next_prio=120
          db_ctl  3225 [000]   213.080213: irq:softirq_raise: vec=1 
[action=TIMER]
          db_ctl  3225 [000]   213.080270: sched:sched_switch: prev_comm=db_ctl 
prev_pid=3225 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 
next_prio=2
     ksoftirqd/0     3 [000]   213.080294: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.080312: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.080332: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> 
next_comm=ethernet_idle next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.080382: printk:console: [  213.080353] 
ether_device poll
   ethernet_idle  3065 [000]   213.080443: printk:console: [  213.080430] send 
dd
   ethernet_idle  3065 [000]   213.080464: printk:console: [  213.080454] 
ether_device xmit
   ethernet_idle  3065 [000]   213.080476: net:net_dev_queue: dev=eth2 
skbaddr=(nil) len=60
   ethernet_idle  3065 [000]   213.080498: printk:console: [  213.080487] enet 
xmit
   ethernet_idle  3065 [000]   213.080530: irq:irq_handler_entry: irq=21 
name=sel_enet_mac
   ethernet_idle  3065 [000]   213.080539: irq:irq_handler_exit: irq=21 
ret=handled
   ethernet_idle  3065 [000]   213.080560: net:net_dev_xmit: dev=eth2 
skbaddr=(nil) len=60 rc=0
   ethernet_idle  3065 [000]   213.080577: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> 
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
 irq/21-sel_enet  3058 [000]   213.080611: printk:console: [  213.080595] enet 
rx
 irq/21-sel_enet  3058 [000]   213.080637: net:netif_rx: dev=eth2 skbaddr=(nil) 
len=46
 irq/21-sel_enet  3058 [000]   213.080656: printk:console: [  213.080645] enet 
tx
 irq/21-sel_enet  3058 [000]   213.080687: sched:sched_switch: 
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> 
next_comm=ethernet_idle next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.080714: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> 
next_comm=database next_pid=3224 next_prio=120
        database  3224 [000]   213.080747: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=dbpool next_pid=3217 next_prio=120
          dbpool  3217 [000]   213.080801: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=database next_pid=3224 
next_prio=120
        database  3224 [000]   213.080825: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=dbpool next_pid=3217 next_prio=120
          dbpool  3217 [000]   213.081210: irq:softirq_raise: vec=1 
[action=TIMER]
          dbpool  3217 [000]   213.081264: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3 
next_prio=2
     ksoftirqd/0     3 [000]   213.081286: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.081311: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.081331: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl 
next_pid=3222 next_prio=120
            dbcl  3222 [000]   213.081581: sched:sched_switch: prev_comm=dbcl 
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=cat next_pid=3197 
next_prio=120
             cat  3197 [000]   213.081733: sched:sched_switch: prev_comm=cat 
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 
next_prio=120
            grep  3198 [000]   213.081853: sched:sched_switch: prev_comm=grep 
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 
next_prio=120
          dbpool  3217 [000]   213.082210: irq:softirq_raise: vec=1 
[action=TIMER]
          dbpool  3217 [000]   213.082264: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 
next_prio=2
     ksoftirqd/0     3 [000]   213.082286: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.082297: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.082316: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl 
next_pid=3222 next_prio=120
            dbcl  3222 [000]   213.082595: sched:sched_switch: prev_comm=dbcl 
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 
next_prio=120
          dbpool  3217 [000]   213.082937: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=dbcl next_pid=3222 
next_prio=120
            dbcl  3222 [000]   213.083207: irq:softirq_raise: vec=1 
[action=TIMER]
            dbcl  3222 [000]   213.083260: sched:sched_switch: prev_comm=dbcl 
prev_pid=3222 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3 
next_prio=2
     ksoftirqd/0     3 [000]   213.083281: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.083291: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.083310: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl 
next_pid=3222 next_prio=120
            dbcl  3222 [000]   213.083455: sched:sched_switch: prev_comm=dbcl 
prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 
next_prio=120
          dbpool  3217 [000]   213.084211: irq:softirq_raise: vec=1 
[action=TIMER]
          dbpool  3217 [000]   213.084255: net:net_dev_queue: dev=lo 
skbaddr=(nil) len=81
          dbpool  3217 [000]   213.084274: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 
next_prio=2
     ksoftirqd/0     3 [000]   213.084297: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.084306: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.084325: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbpool 
next_pid=3217 next_prio=120
          dbpool  3217 [000]   213.084345: net:netif_rx: dev=lo skbaddr=(nil) 
len=67
          dbpool  3217 [000]   213.084354: net:net_dev_xmit: dev=lo 
skbaddr=(nil) len=81 rc=0
          dbpool  3217 [000]   213.084495: sched:sched_switch: prev_comm=dbpool 
prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle 
next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.084558: printk:console: [  213.084528] 
ether_device poll
   ethernet_idle  3065 [000]   213.084591: printk:console: [  213.084581] 
WARNING 0: 1 datagram timed out!
   ethernet_idle  3065 [000]   213.084650: printk:console: [  213.084639] send 
df
   ethernet_idle  3065 [000]   213.084668: printk:console: [  213.084659] 
ether_device xmit
   ethernet_idle  3065 [000]   213.084680: net:net_dev_queue: dev=eth2 
skbaddr=(nil) len=60
   ethernet_idle  3065 [000]   213.084702: printk:console: [  213.084692] enet 
xmit
   ethernet_idle  3065 [000]   213.084734: irq:irq_handler_entry: irq=21 
name=sel_enet_mac
   ethernet_idle  3065 [000]   213.084742: irq:irq_handler_exit: irq=21 
ret=handled
   ethernet_idle  3065 [000]   213.084763: net:net_dev_xmit: dev=eth2 
skbaddr=(nil) len=60 rc=0
   ethernet_idle  3065 [000]   213.084780: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> 
next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
 irq/21-sel_enet  3058 [000]   213.084815: printk:console: [  213.084798] enet 
rx
 irq/21-sel_enet  3058 [000]   213.084841: net:netif_rx: dev=eth2 skbaddr=(nil) 
len=46
 irq/21-sel_enet  3058 [000]   213.084860: printk:console: [  213.084849] enet 
tx
 irq/21-sel_enet  3058 [000]   213.084892: sched:sched_switch: 
prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> 
next_comm=database next_pid=3224 next_prio=120
        database  3224 [000]   213.084925: printk:console: [  213.084910] 
process_backlog exit 1
        database  3224 [000]   213.084936: napi:napi_poll: napi poll on napi 
struct (nil) for device (no_device)
        database  3224 [000]   213.084945: irq:softirq_raise: vec=3 
[action=NET_RX]
        database  3224 [000]   213.084953: irq:softirq_exit: vec=3 
[action=NET_RX]
        database  3224 [000]   213.084962: irq:softirq_entry: vec=3 
[action=NET_RX]
        database  3224 [000]   213.084979: printk:console: [  213.084969] 
process_backlog entry
        database  3224 [000]   213.084990: net:netif_receive_skb: dev=lo 
skbaddr=(nil) len=52
        database  3224 [000]   213.085025: net:netif_receive_skb: dev=eth2 
skbaddr=(nil) len=46
        database  3224 [000]   213.085045: printk:console: [  213.085032] 
ether_device recv
        database  3224 [000]   213.085057: net:netif_receive_skb: dev=lo 
skbaddr=(nil) len=67
        database  3224 [000]   213.085072: net:netif_receive_skb: dev=eth2 
skbaddr=(nil) len=46
        database  3224 [000]   213.085090: printk:console: [  213.085078] 
ether_device recv
        database  3224 [000]   213.085109: printk:console: [  213.085099] 
process_backlog exit 1
        database  3224 [000]   213.085118: napi:napi_poll: napi poll on napi 
struct (nil) for device (no_device)
        database  3224 [000]   213.085124: irq:softirq_exit: vec=3 
[action=NET_RX]
        database  3224 [000]   213.085199: irq:softirq_raise: vec=1 
[action=TIMER]
        database  3224 [000]   213.085247: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
     ksoftirqd/0     3 [000]   213.085268: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.085284: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.085304: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat 
next_pid=3197 next_prio=120
             cat  3197 [000]   213.085488: sched:sched_switch: prev_comm=cat 
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 
next_prio=120
            grep  3198 [000]   213.085656: sched:sched_switch: prev_comm=grep 
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle 
next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.085689: sched:sched_switch: 
prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> 
next_comm=database next_pid=3224 next_prio=120
        database  3224 [000]   213.085812: net:net_dev_queue: dev=lo 
skbaddr=(nil) len=66
        database  3224 [000]   213.085835: net:netif_rx: dev=lo skbaddr=(nil) 
len=52
        database  3224 [000]   213.085843: irq:softirq_raise: vec=3 
[action=NET_RX]
        database  3224 [000]   213.085852: net:net_dev_xmit: dev=lo 
skbaddr=(nil) len=66 rc=0
        database  3224 [000]   213.085866: irq:softirq_entry: vec=3 
[action=NET_RX]
        database  3224 [000]   213.085905: printk:console: [  213.085878] 
process_backlog entry
        database  3224 [000]   213.085920: net:netif_receive_skb: dev=lo 
skbaddr=(nil) len=52
        database  3224 [000]   213.086000: printk:console: [  213.085979] 
process_backlog exit 1
        database  3224 [000]   213.086016: napi:napi_poll: napi poll on napi 
struct (nil) for device (no_device)
        database  3224 [000]   213.086024: irq:softirq_exit: vec=3 
[action=NET_RX]
        database  3224 [000]   213.086206: irq:softirq_raise: vec=1 
[action=TIMER]
        database  3224 [000]   213.086259: sched:sched_switch: 
prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> 
next_comm=ksoftirqd/0 next_pid=3 next_prio=2
     ksoftirqd/0     3 [000]   213.086281: irq:softirq_entry: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.086303: irq:softirq_exit: vec=1 
[action=TIMER]
     ksoftirqd/0     3 [000]   213.086323: sched:sched_switch: 
prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat 
next_pid=3197 next_prio=120
             cat  3197 [000]   213.086439: sched:sched_switch: prev_comm=cat 
prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 
next_prio=120
            grep  3198 [000]   213.086539: sched:sched_switch: prev_comm=grep 
prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle 
next_pid=3065 next_prio=120
   ethernet_idle  3065 [000]   213.086593: printk:console: [  213.086565] 
ether_device poll
   ethernet_idle  3065 [000]   213.086622: printk:console: [  213.086612] 
unmatched dd

Reply via email to