Kevin Darbyshire-Bryant <ke...@darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 08:39, Pete Heist <p...@heistp.net> wrote:
>> 
>> 
>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant 
>>> <ke...@darbyshire-bryant.me.uk> wrote:
>>> 
>>> The very bad idea:
>>> 
>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>> implementation as described above. So an awful lot of our
>>> shenanigans above is due to DSCP not traversing the internet
>>> particularly well. The solution above abstracts DSCP into ’tins’
>>> which we put into fwmarks. Another approach would be to put the DSCP
>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>> local domain preserved.
>> 
>> If I understand it right, another use case for this “very bad idea”
>> is preserving DSCP locally while traversing upstream WiFi links as
>> besteffort, which avoids airtime efficiency problems that can occur
>> with 802.11e (WMM). In cases where the router config can’t be changed
>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>> it hides DSCP from the WiFi stack while preserving the values through
>> the tunnel, but this would be easier. Neat… :)
>
> Everyone has understood the intent & maybe the implementation
> correctly. 2 patches attached, one for cake, one for tc.
>
> They are naively coded and some of it undoes Toke’s recent tidying up
> (sorry!)

Heh. First comment: Don't do that ;)

A few more below.

> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> diff --git a/pkt_sched.h b/pkt_sched.h
> index a2f570c..d1f288d 100644
> --- a/pkt_sched.h
> +++ b/pkt_sched.h
> @@ -879,6 +879,7 @@ enum {
>       TCA_CAKE_ACK_FILTER,
>       TCA_CAKE_SPLIT_GSO,
>       TCA_CAKE_FWMARK,
> +     TCA_CAKE_ICING,
>       __TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
> diff --git a/sch_cake.c b/sch_cake.c
> index 733b897..5aca0f3 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -270,7 +270,8 @@ enum {
>       CAKE_FLAG_INGRESS          = BIT(2),
>       CAKE_FLAG_WASH             = BIT(3),
>       CAKE_FLAG_SPLIT_GSO        = BIT(4),
> -     CAKE_FLAG_FWMARK           = BIT(5)
> +     CAKE_FLAG_FWMARK           = BIT(5),
> +     CAKE_FLAG_ICING            = BIT(6)

This implies that icing and fwmark can be enabled completely
independently of each other. Are you sure about the semantics for that?
>  };
>  
>  /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>  };
>  
>  static const u8 diffserv4[] = {
> -     0, 2, 0, 0, 2, 0, 0, 0,
> +     0, 1, 0, 0, 2, 0, 0, 0,
>       1, 0, 0, 0, 0, 0, 0, 0,
>       2, 0, 2, 0, 2, 0, 2, 0,
>       2, 0, 2, 0, 2, 0, 2, 0,
> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>  };
>  
>  static const u8 diffserv3[] = {
> -     0, 0, 0, 0, 2, 0, 0, 0,
> +     0, 1, 0, 0, 2, 0, 0, 0,

Why are you messing with the diffserv mappings in this patch?

>       1, 0, 0, 0, 0, 0, 0, 0,
>       0, 0, 0, 0, 0, 0, 0, 0,
>       0, 0, 0, 0, 0, 0, 0, 0,
> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, 
> struct sk_buff **to_free)
>       return idx + (tin << 16);
>  }
>  
> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
> +{
> +     switch (skb->protocol) {
> +     case htons(ETH_P_IP):
> +             if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +                     ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
> +             break;
> +     case htons(ETH_P_IPV6):
> +             if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +                     ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
> +             break;
> +     default:
> +             break;
> +     }
> +
> +}

So washing is just a special case of this (wash is
cake_update_diffserv(skb,0)). So you shouldn't need to add another
function, just augment the existing handling code.

> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>  {
>       u8 dscp;
>  
> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, 
> u16 wash)
>       }
>  }
>  
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)

Save an ifdef below by moving the ifdef inside the function definition.

> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
> +{
> +     enum ip_conntrack_info ctinfo;
> +     struct nf_conn *ct;
> +
> +     ct = nf_ct_get(skb, &ctinfo);
> +     if (!ct)
> +             return;
> +
> +     ct->mark &= 0x80ffffff;
> +     ct->mark |= (0x40 | dscp) << 24;

Right, so we *might* have an argument that putting the *tin* into the
fwmark is CAKE's business, but copying over the dscp mark is not
something a qdisc should be doing...

> +     nf_conntrack_event_cache(IPCT_MARK, ct);
> +}
> +#endif

Also, are you sure this will work in all permutations of conntrack being
a module vs not etc? (we had to jump through quite some hoops to get the
conntrack hooks to work last time; this is probably my biggest worry here).

>  static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>                                            struct sk_buff *skb)
>  {
>       struct cake_sched_data *q = qdisc_priv(sch);
> -     u32 tin;
> +     bool wash;
>       u8 dscp;
> +     u8 tin;
>  
> -     /* Tin selection: Default to diffserv-based selection, allow overriding
> -      * using firewall marks or skb->priority.
> -      */
> -     dscp = cake_handle_diffserv(skb,
> -                                 q->rate_flags & CAKE_FLAG_WASH);
> +     wash = !!(q->rate_flags & CAKE_FLAG_WASH);
> +
> +     if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>  
> -     if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>               tin = 0;
> +             if (wash)
> +                     cake_update_diffserv(skb, 0);
>  
> -     else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> -              skb->mark &&
> -              skb->mark <= q->tin_cnt)
> -             tin = q->tin_order[skb->mark - 1];
> +     } else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
> +                TC_H_MIN(skb->priority) > 0 &&
> +                TC_H_MIN(skb->priority) <= q->tin_cnt) {
>  
> -     else if (TC_H_MAJ(skb->priority) == sch->handle &&
> -              TC_H_MIN(skb->priority) > 0 &&
> -              TC_H_MIN(skb->priority) <= q->tin_cnt)
>               tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
> +             if (wash)
> +                     cake_update_diffserv(skb, 0);
> +
> +     } else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> +                skb->mark & 0x40000000) {
> +
> +             dscp = skb->mark >> 24 & 0x3f;
> +             tin = q->tin_index[dscp];
>  
> -     else {
> +             if (wash)
> +                     cake_update_diffserv(skb, 0);
> +             else if (q->rate_flags & CAKE_FLAG_ICING)
> +                     cake_update_diffserv(skb, dscp << 2);
> +
> +     } else { /* fallback to DSCP */
> +             /* extract the Diffserv Precedence field, if it exists */
> +             /* and clear DSCP bits if washing */
> +             dscp = cake_handle_diffserv(skb, wash);
>               tin = q->tin_index[dscp];

As I said above, no reason to revert the cleanup commit...

>               if (unlikely(tin >= q->tin_cnt))
>                       tin = 0;
> +
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> +             if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & 
> CAKE_FLAG_INGRESS))
> +                     cake_update_ct_mark(skb, dscp);
> +#endif

See above about moving the ifdef and losing this one.

>       }
>  
>       return &q->tins[tin];
> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct 
> nlattr *opt,
>                       q->rate_flags &= ~CAKE_FLAG_FWMARK;
>       }
>  
> +     if (tb[TCA_CAKE_ICING]) {
> +             if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
> +                     q->rate_flags |= CAKE_FLAG_ICING;
> +             else
> +                     q->rate_flags &= ~CAKE_FLAG_ICING;
> +     }
> +
>       if (q->tins) {
>               sch_tree_lock(sch);
>               cake_reconfigure(sch);
> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff 
> *skb)
>                       !!(q->rate_flags & CAKE_FLAG_FWMARK)))
>               goto nla_put_failure;
>  
> +     if (nla_put_u32(skb, TCA_CAKE_ICING,
> +                     !!(q->rate_flags & CAKE_FLAG_ICING)))
> +             goto nla_put_failure;
> +
>       return nla_nest_end(skb, opts);
>  
>  nla_put_failure:
> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
> From: Kevin Darbyshire-Bryant <l...@darbyshire-bryant.me.uk>
> Date: Wed, 27 Feb 2019 14:46:05 +0000
> Subject: [PATCH] cake: add fwmark & icing options
>
> Signed-off-by: Kevin Darbyshire-Bryant <l...@darbyshire-bryant.me.uk>
> ---
>  include/uapi/linux/pkt_sched.h |  2 ++
>  man/man8/tc-cake.8             | 19 ++++++++++++++++
>  tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>  3 files changed, 61 insertions(+)
>
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 01f96352..f2b1b270 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -954,6 +954,8 @@ enum {
>       TCA_CAKE_INGRESS,
>       TCA_CAKE_ACK_FILTER,
>       TCA_CAKE_SPLIT_GSO,
> +     TCA_CAKE_FWMARK,
> +     TCA_CAKE_ICING,
>       __TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
> index eda436e1..626d4525 100644
> --- a/man/man8/tc-cake.8
> +++ b/man/man8/tc-cake.8
> @@ -73,6 +73,12 @@ TIME |
>  ]
>  .br
>  [
> +.BR fwmark
> +|
> +.BR nofwmark*
> +]
> +.br
> +[
>  .BR split-gso*
>  |
>  .BR no-split-gso
> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be 
> used as both source and
>  destination host.
>  
>  
> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
> +
> +In addition to TC FILTER tin classification, firewall marks may also 
> optionally
> +be used.  The priority order (highest to lowest) for tin selection is TC 
> filter,
> +firewall mark and then DSCP.
> +.PP
> +.B fwmark
> +
> +.br
> +     Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to 
> the
> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
> +Values outside the valid range are ignored and CAKE will fall back to using
> +DSCP for tin selection.

This should document the masking and shifting.


>  
>  .SH EXAMPLES
>  # tc qdisc delete root dev eth0
> diff --git a/tc/q_cake.c b/tc/q_cake.c
> index e827e3f1..fdafd3b7 100644
> --- a/tc/q_cake.c
> +++ b/tc/q_cake.c
> @@ -79,6 +79,8 @@ static void explain(void)
>  "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>  "                [ nat | nonat* ]\n"
>  "                [ wash | nowash* ]\n"
> +"                [ icing | noicing* ]\n"
> +"                [ fwmark | nofwmark* ]\n"

Much as I appreciate the wordplay, I'm not sure this is actually going
to be super helpful for something just trying to make sense of the
command output. Not sure I have a better suggestion, though...

-Toke
_______________________________________________
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake

Reply via email to