Pete Heist <p...@heistp.net> writes: >> On Jul 6, 2018, at 1:33 PM, Toke Høiland-Jørgensen <t...@toke.dk> wrote: >> >> AHA! Found the culprit! >> >> The bulk dequeue mechanism in sch_generic.c will dequeue a bunch of >> packets at once, then check if they belong on the same hardware txq. If >> they don't, they will be put back on a separate queue in the qdisc >> structure (sch->skb_bad_txq), and the qlen will be increased, without >> telling the qdisc about it. > > Solid, nice work!
Thanks :) >> This obviously only happens on hardware with multiple TXQs, which is why >> the bug doesn't happen on veth. > > > It would be nice if veth were mq capable. > > For whatever reason, I didn’t see this on my i210at’s (1gbit ethernet > with 4 transmit and 4 receive queues). Well, you have to hit the exact conditions; i.e., a bulk dequeue that happens to get a bunch of packets that hit different TX queues. So that depends on both the TXQ hashing, and the queue state, number of flows etc. I only get a handful of "lockups" (debug lines) on a 10-sec netperf test with 6 flows. > I’m now playing with netem, cake and veth for the first time (two > namespaces with netem as the parent qdisc to cake for each namespace). > I’ve gotten the setup not to lock up in an infinite loop but to > occasionally stop passing traffic sometimes after a netperf test. This > could easily be a problem specific to netns though, so I’ll be playing > with it some more and will post if I can narrow it down to something > specific. Yay, more fun! :P Please do see if you can narrow this down; it would be good to fix this as well before we submit another version upstream... -Toke _______________________________________________ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake