0 length packet? maybe coming out of the new GSO/GRO code? check truesize also?
On Thu, Jul 5, 2018 at 4:48 PM Georgios Amanakis <[email protected]> wrote: > > I am going to give it a try, with your patch applied tonight and report. > Thank you! > > George > > On Thu, Jul 5, 2018, 6:31 PM Toke Høiland-Jørgensen <[email protected]> wrote: >> >> Toke Høiland-Jørgensen <[email protected]> writes: >> >> > Jonathan Morton <[email protected]> writes: >> > >> >>> On 3 Jul, 2018, at 1:23 am, Toke Høiland-Jørgensen <[email protected]> wrote: >> >>> >> >>> My hunch is that this has something to do with the way mlx5 uses >> >>> multiple receive queues (and thus multiple CPUs). Which is probably >> >>> different from veth... >> >> >> >> At this stage I'm pretty confident it has nothing to do with Cake, and >> >> everything to do with the Mellanox hardware and driver. It does strike >> >> me that Linux' default handling of multiqueue hardware doesn't map >> >> very well to the qdisc interface. >> > >> > Well, it doesn't happen with fq_codel, so even if it is a driver bug, it >> > is being triggered by cake specifically... >> >> Right, so finally got some time to investigate this further. >> >> I suspected that cake_dequeue() was looping forever, so I added some >> debug statements to investigate this; and turns out I was right. Using >> the debug patch below, in unlimited mode I get loop aborts on loop 'i' >> for unlimited mode and loop 'l' if I enable the shaper at 70 gbit. It >> happens pretty reliably, but only when I load up the link sufficiently >> (need 4-6 TCP flows which get ~50 Gbps of total throughput). >> >> The weird thing is that what appears to be happening, is that cake >> somehow gets into a state where sch->q.qlen is >0 while all tin backlogs >> are 0. I have no clue how this happens; as far as I can tell, all >> changes to tin_backlog are paired with a change to q.qlen. The only >> thing outside of cake itself that modifies q.qlen is peek(), which is >> not being used here. >> >> I'm giving up for tonight; if anyone else has any ideas, I'm all ears. >> >> -Toke >> >> Sample debug output: >> >> [ 5456.068281] Loop counter i hit 100k; aborting! i 100001 j 0 k 180 l 3 m 0 >> qlen 2 qbkllog 33184 tin 2 deficit 172 tot backlog 0 >> >> With this debug patch: >> >> @@ -1892,6 +1892,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch) >> u64 delay; >> u32 len; >> >> + int i=0,j=0,k=0,l=0,m=0; >> + >> +#define COUNT_LOOP(v) do { \ >> + if (++v > 100000) { \ >> + int tot_bkl = 0; \ >> + struct cake_tin_data *t; \ >> + int n; \ >> + for(n=0,t = q->tins; n < CAKE_MAX_TINS; n++,t++) >> \ >> + tot_bkl += t->tin_backlog; \ >> + net_warn_ratelimited("Loop counter " #v " hit 100k; >> aborting! i %d j %d k %d l %d m %d qlen %d qbkllog %d tin %d deficit %d tot >> backlog %d", i, j, k, l, m, sch->q.qlen, sch->qstats.backlog, q->cur_tin, >> b->tin_deficit, tot_bkl); \ >> + return NULL; \ >> + } \ >> + } while(0); >> + >> begin: >> if (!sch->q.qlen) >> return NULL; >> @@ -1912,6 +1926,7 @@ begin: >> /* In unlimited mode, can't rely on shaper timings, just >> balance >> * with DRR >> */ >> + i=0; >> while (b->tin_deficit < 0 || >> !(b->sparse_flow_count + b->bulk_flow_count)) { >> if (b->tin_deficit <= 0) >> @@ -1923,6 +1938,7 @@ begin: >> q->cur_tin = 0; >> b = q->tins; >> } >> + COUNT_LOOP(i); >> } >> } else { >> /* In shaped mode, choose: >> @@ -1960,8 +1976,10 @@ retry: >> head = &b->old_flows; >> if (unlikely(list_empty(head))) { >> head = &b->decaying_flows; >> - if (unlikely(list_empty(head))) >> + if (unlikely(list_empty(head))) { >> + COUNT_LOOP(j); >> goto begin; >> + } >> } >> } >> } >> @@ -2008,6 +2026,7 @@ retry: >> flow->set = CAKE_SET_SPARSE_WAIT; >> } >> } >> + COUNT_LOOP(k); >> goto retry; >> } >> >> @@ -2050,6 +2069,7 @@ retry: >> srchost->srchost_refcnt--; >> dsthost->dsthost_refcnt--; >> } >> + COUNT_LOOP(l); >> goto begin; >> } >> >> @@ -2075,6 +2095,8 @@ retry: >> kfree_skb(skb); >> if (q->rate_flags & CAKE_FLAG_INGRESS) >> goto retry; >> + >> + COUNT_LOOP(m); >> } >> >> b->tin_ecn_mark += !!flow->cvars.ecn_marked; >> >> >> > _______________________________________________ > Cake mailing list > [email protected] > https://lists.bufferbloat.net/listinfo/cake -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 _______________________________________________ Cake mailing list [email protected] https://lists.bufferbloat.net/listinfo/cake
