I wonder if you've dismissed eBPF too quickly. Reading around the subject
that's the way the kernel seems to be going for both network actions and
various other purposes. I wonder if passing the info about cake could be
done via eBPF maps. I can't see your original eBPF example at it's
disappeared off github.

John


On 08/03/2019 14:03, Kevin Darbyshire-Bryant wrote:
> 
> 
> OK, what I am trying to do is classify incoming connections into relevant 
> cake tins to impose some bandwidth fairness.  e.g. classify bittorrent & 
> things that are downloads into the Bulk tin, and prioritise stuff that is 
> streaming video into the Video tin. Incoming DSCP has a) been washed and b) 
> is unreliable anyway so is unhelpful in this case.  iptables runs too late, 
> so having rules to classify incoming stuff is pointless.
> 
> tc filters run early enough to use the tc skbedit major/minor number to 
> influence cake’s tin decisions.  But tc filters, a) don’t get to see 
> de-natted ipv4 addresses, b) daisy chain, so all filters must be traversed.  
> I can’t find my original tc filter ‘de-prio bittorrent’ but it was a very 
> simple ‘does this destination port match?, yes skbedit to select bulk tin’ - 
> I wanted to do more but the daisy chaining & lack of de-natting made this 
> technique useless.
> 
> Then I recently discovered act_connmark 
> (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the thinking 
> being I could use iptables on egress to set fwmarks to classify a connection 
> and have the ingress packets magically follow.  This worked but still 
> required 3 tc filter actions to cope with 4 tins:
> 
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw action 
> skbedit priority ${MAJOR}1
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw action 
> skbedit priority ${MAJOR}3
> $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw action 
> skbedit priority ${MAJOR}4
> 
> It also requires similar tc filters on the egress path in addition to the 
> iptables rules.
> 
> Could that be improved?  Yes, sort of. eBPF to the rescue-ish.  I could write 
> an eBPF classifier action program to directly copy the fwmark to the priority 
> field which cake would pick up.  I would have stopped there but as I’ve said 
> in a previous email, the eBPF needed to know (hard code) the cake instance 
> major numbers and there was the whole mystery tour of writing/building it.
> 
> The other problem with the above magic tin encode into the fwmark routine is 
> that it ignored any good citizens that were using the correct DSCP (e.g. 
> dropbear). I would need to write iptables rules to classify existing DSCP 
> codepoints into the matching tin for fwmark.  So ideally I needed the DSCP to 
> drive things and still act as a key into the fwmark mechanism.
> 
> The overriding (if required) of DSCP could be done in iptables and to avoid 
> going through the iptables DSCP decision/mangling for every packet I could 
> use a flag within the fwmark to indicate the decision had previously been 
> made and stored for this connection.
> 
> 
> The current rules are:
> 
>     # Configure iptables chain to mark packets
>     ipt -t mangle -N QOS_MARK_${IFACE}
> 
>     # Change DSCP of initial relevant hosts/packets - this will be picked up 
> by cake+ and placed in the firewall connmark
>     # also the DSCP is used as the tin selector.
> 
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.5 -m comment 
> --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.5 -m comment 
> --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.10 -m comment 
> --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.10 -m comment 
> --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp 
> --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class 
> CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.12 -m udp 
> --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class 
> CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp 
> --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class 
> CS1
> iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp 
> --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP 
> --set-dscp-class CS1
> 
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk4  dst -j DSCP 
> --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid4   dst -j DSCP 
> --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
> iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice4 dst -j DSCP 
> --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"
> 
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff 
> -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP 
> --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p udp -s ::c/::ffff:ffff:ffff:ffff 
> -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP 
> --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff 
> -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP 
> --set-dscp-class CS1
> ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff 
> -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP 
> --set-dscp-class CS1
> 
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk6  dst -j 
> DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset"
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid6 dst -j DSCP 
> --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset"
> ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice6 dst -j 
> DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset"
> 
>     # Send cake+ unmarked connections to the marking chain - Cake+ uses top 
> byte as the
>     # i've been marked & here's the dscp placeholder. 
>     # top 6 bits are DSCP, LSB is DSCP is valid flag
>     ipt -t mangle -A PREROUTING  -i $IFACE -m mark --mark 0x00/0x01000000 -g 
> QOS_MARK_${IFACE}
>     ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x01000000 -g 
> QOS_MARK_${IFACE}
> 
> 
> The initial egress packet for a connection will go through the above chain 
> (--mark 0x00/0x01000000 -g QOS_MARK_${IFACE}) where the DSCP value is change 
> if required.
> 
> Cake will see this initial packet, inspect the fwmark, and because it hasn’t 
> been set will both copy the dscp into the mark and set the ‘fwdscp marked’ 
> bit.
> 
> Subsequent egress packets will neither go through the iptables DSCP mangle or 
> the cake ‘update the fwmark’ routine.  Instead, cake will use the fwmark as 
> the tin selector.
> 
> 
> The ingress path is different.  First off act_connmark restores any 
> connection mark to the packet.  Cake will inspect the fwmark for the ‘fwdscp 
> marked’ bit.  If it is set, then the dscp coded in the firewall mark is used 
> for tin selection.  Optionally the encoded DSCP is restored to the packet’s 
> diffserv, but I personally don’t use that functionality as I’m only 
> interested in ’tin fair’ use of the link.  And that’s it.
> 
> I’m doing 2 things.
> 
> 1) Classifying traffic into tins on ingress based on the egress DSCP values 
> contained in fwmarks.
> 2) Basing the fwmark contained DSCP on the initial packet of the connection, 
> possibly after being modified once by iptables rules.
> 
> 
>>
>> In particular, requirement 2 is why I'm pushing back against hard-coding
>> a mask anywhere…
> 
> I think with ‘fwmark mask’, ‘get_dscp’, ’set_dscp’, ‘get_state mask’, 
> ’set_state mask’ nothing *is* hard coded.
> 
>>
>> So could you maybe post your current ruleset and explain what it is you
>> are trying to achieve at a high level, and why? :)
> 
> I hope I’ve done that.
> 
>>
>> Also, you keep mentioning "must be lighter on CPU". Do you have any
>> performance numbers to show the impact of your current ruleset? Would be
>> easier to assess any performance impact if we have some baseline numbers
>> to compare against…
> 
> Let me see if I can quantify that in some way.
> 
>>
>> -Toke
> 
> 
> Cheers,
> 
> Kevin D-B
> 
> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> 
> _______________________________________________
> Cake mailing list
> [email protected]
> https://lists.bufferbloat.net/listinfo/cake
> 
_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Reply via email to