Kevin Darbyshire-Bryant <ke...@darbyshire-bryant.me.uk> writes: >> On 4 Mar 2019, at 08:39, Pete Heist <p...@heistp.net> wrote: >> >> >>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant >>> <ke...@darbyshire-bryant.me.uk> wrote: >>> >>> The very bad idea: >>> >>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark >>> implementation as described above. So an awful lot of our >>> shenanigans above is due to DSCP not traversing the internet >>> particularly well. The solution above abstracts DSCP into ’tins’ >>> which we put into fwmarks. Another approach would be to put the DSCP >>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >>> DSCP into the diffserv field onto the actual packets. Voila DSCP >>> traversal across ’tinternet with tin/bandwidth allocation in our >>> local domain preserved. >> >> If I understand it right, another use case for this “very bad idea” >> is preserving DSCP locally while traversing upstream WiFi links as >> besteffort, which avoids airtime efficiency problems that can occur >> with 802.11e (WMM). In cases where the router config can’t be changed >> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as >> it hides DSCP from the WiFi stack while preserving the values through >> the tunnel, but this would be easier. Neat… :) > > Everyone has understood the intent & maybe the implementation > correctly. 2 patches attached, one for cake, one for tc. > > They are naively coded and some of it undoes Toke’s recent tidying up > (sorry!)
Heh. First comment: Don't do that ;) A few more below. > 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A > diff --git a/pkt_sched.h b/pkt_sched.h > index a2f570c..d1f288d 100644 > --- a/pkt_sched.h > +++ b/pkt_sched.h > @@ -879,6 +879,7 @@ enum { > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/sch_cake.c b/sch_cake.c > index 733b897..5aca0f3 100644 > --- a/sch_cake.c > +++ b/sch_cake.c > @@ -270,7 +270,8 @@ enum { > CAKE_FLAG_INGRESS = BIT(2), > CAKE_FLAG_WASH = BIT(3), > CAKE_FLAG_SPLIT_GSO = BIT(4), > - CAKE_FLAG_FWMARK = BIT(5) > + CAKE_FLAG_FWMARK = BIT(5), > + CAKE_FLAG_ICING = BIT(6) This implies that icing and fwmark can be enabled completely independently of each other. Are you sure about the semantics for that? > }; > > /* COBALT operates the Codel and BLUE algorithms in parallel, in order to > @@ -333,7 +334,7 @@ static const u8 diffserv8[] = { > }; > > static const u8 diffserv4[] = { > - 0, 2, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, > 1, 0, 0, 0, 0, 0, 0, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > @@ -344,7 +345,7 @@ static const u8 diffserv4[] = { > }; > > static const u8 diffserv3[] = { > - 0, 0, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, Why are you messing with the diffserv mappings in this patch? > 1, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, > struct sk_buff **to_free) > return idx + (tin << 16); > } > > -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) > +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) > +{ > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); > + break; > + case htons(ETH_P_IPV6): > + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp) > + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); > + break; > + default: > + break; > + } > + > +} So washing is just a special case of this (wash is cake_update_diffserv(skb,0)). So you shouldn't need to add another function, just augment the existing handling code. > +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) > { > u8 dscp; > > @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, > u16 wash) > } > } > > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) Save an ifdef below by moving the ifdef inside the function definition. > +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) > +{ > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + > + ct = nf_ct_get(skb, &ctinfo); > + if (!ct) > + return; > + > + ct->mark &= 0x80ffffff; > + ct->mark |= (0x40 | dscp) << 24; Right, so we *might* have an argument that putting the *tin* into the fwmark is CAKE's business, but copying over the dscp mark is not something a qdisc should be doing... > + nf_conntrack_event_cache(IPCT_MARK, ct); > +} > +#endif Also, are you sure this will work in all permutations of conntrack being a module vs not etc? (we had to jump through quite some hoops to get the conntrack hooks to work last time; this is probably my biggest worry here). > static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, > struct sk_buff *skb) > { > struct cake_sched_data *q = qdisc_priv(sch); > - u32 tin; > + bool wash; > u8 dscp; > + u8 tin; > > - /* Tin selection: Default to diffserv-based selection, allow overriding > - * using firewall marks or skb->priority. > - */ > - dscp = cake_handle_diffserv(skb, > - q->rate_flags & CAKE_FLAG_WASH); > + wash = !!(q->rate_flags & CAKE_FLAG_WASH); > + > + if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) { > > - if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) > tin = 0; > + if (wash) > + cake_update_diffserv(skb, 0); > > - else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > - skb->mark && > - skb->mark <= q->tin_cnt) > - tin = q->tin_order[skb->mark - 1]; > + } else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */ > + TC_H_MIN(skb->priority) > 0 && > + TC_H_MIN(skb->priority) <= q->tin_cnt) { > > - else if (TC_H_MAJ(skb->priority) == sch->handle && > - TC_H_MIN(skb->priority) > 0 && > - TC_H_MIN(skb->priority) <= q->tin_cnt) > tin = q->tin_order[TC_H_MIN(skb->priority) - 1]; > + if (wash) > + cake_update_diffserv(skb, 0); > + > + } else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > + skb->mark & 0x40000000) { > + > + dscp = skb->mark >> 24 & 0x3f; > + tin = q->tin_index[dscp]; > > - else { > + if (wash) > + cake_update_diffserv(skb, 0); > + else if (q->rate_flags & CAKE_FLAG_ICING) > + cake_update_diffserv(skb, dscp << 2); > + > + } else { /* fallback to DSCP */ > + /* extract the Diffserv Precedence field, if it exists */ > + /* and clear DSCP bits if washing */ > + dscp = cake_handle_diffserv(skb, wash); > tin = q->tin_index[dscp]; As I said above, no reason to revert the cleanup commit... > if (unlikely(tin >= q->tin_cnt)) > tin = 0; > + > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > + if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & > CAKE_FLAG_INGRESS)) > + cake_update_ct_mark(skb, dscp); > +#endif See above about moving the ifdef and losing this one. > } > > return &q->tins[tin]; > @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct > nlattr *opt, > q->rate_flags &= ~CAKE_FLAG_FWMARK; > } > > + if (tb[TCA_CAKE_ICING]) { > + if (!!nla_get_u32(tb[TCA_CAKE_ICING])) > + q->rate_flags |= CAKE_FLAG_ICING; > + else > + q->rate_flags &= ~CAKE_FLAG_ICING; > + } > + > if (q->tins) { > sch_tree_lock(sch); > cake_reconfigure(sch); > @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff > *skb) > !!(q->rate_flags & CAKE_FLAG_FWMARK))) > goto nla_put_failure; > > + if (nla_put_u32(skb, TCA_CAKE_ICING, > + !!(q->rate_flags & CAKE_FLAG_ICING))) > + goto nla_put_failure; > + > return nla_nest_end(skb, opts); > > nla_put_failure: > From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001 > From: Kevin Darbyshire-Bryant <l...@darbyshire-bryant.me.uk> > Date: Wed, 27 Feb 2019 14:46:05 +0000 > Subject: [PATCH] cake: add fwmark & icing options > > Signed-off-by: Kevin Darbyshire-Bryant <l...@darbyshire-bryant.me.uk> > --- > include/uapi/linux/pkt_sched.h | 2 ++ > man/man8/tc-cake.8 | 19 ++++++++++++++++ > tc/q_cake.c | 40 ++++++++++++++++++++++++++++++++++ > 3 files changed, 61 insertions(+) > > diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h > index 01f96352..f2b1b270 100644 > --- a/include/uapi/linux/pkt_sched.h > +++ b/include/uapi/linux/pkt_sched.h > @@ -954,6 +954,8 @@ enum { > TCA_CAKE_INGRESS, > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > + TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 > index eda436e1..626d4525 100644 > --- a/man/man8/tc-cake.8 > +++ b/man/man8/tc-cake.8 > @@ -73,6 +73,12 @@ TIME | > ] > .br > [ > +.BR fwmark > +| > +.BR nofwmark* > +] > +.br > +[ > .BR split-gso* > | > .BR no-split-gso > @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be > used as both source and > destination host. > > > +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS > + > +In addition to TC FILTER tin classification, firewall marks may also > optionally > +be used. The priority order (highest to lowest) for tin selection is TC > filter, > +firewall mark and then DSCP. > +.PP > +.B fwmark > + > +.br > + Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to > the > +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4. > +Values outside the valid range are ignored and CAKE will fall back to using > +DSCP for tin selection. This should document the masking and shifting. > > .SH EXAMPLES > # tc qdisc delete root dev eth0 > diff --git a/tc/q_cake.c b/tc/q_cake.c > index e827e3f1..fdafd3b7 100644 > --- a/tc/q_cake.c > +++ b/tc/q_cake.c > @@ -79,6 +79,8 @@ static void explain(void) > " dual-srchost | dual-dsthost | triple-isolate* ]\n" > " [ nat | nonat* ]\n" > " [ wash | nowash* ]\n" > +" [ icing | noicing* ]\n" > +" [ fwmark | nofwmark* ]\n" Much as I appreciate the wordplay, I'm not sure this is actually going to be super helpful for something just trying to make sense of the command output. Not sure I have a better suggestion, though... -Toke _______________________________________________ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake