I wonder if you've dismissed eBPF too quickly. Reading around the subject that's the way the kernel seems to be going for both network actions and various other purposes. I wonder if passing the info about cake could be done via eBPF maps. I can't see your original eBPF example at it's disappeared off github.
John On 08/03/2019 14:03, Kevin Darbyshire-Bryant wrote: > > > OK, what I am trying to do is classify incoming connections into relevant > cake tins to impose some bandwidth fairness. e.g. classify bittorrent & > things that are downloads into the Bulk tin, and prioritise stuff that is > streaming video into the Video tin. Incoming DSCP has a) been washed and b) > is unreliable anyway so is unhelpful in this case. iptables runs too late, > so having rules to classify incoming stuff is pointless. > > tc filters run early enough to use the tc skbedit major/minor number to > influence cake’s tin decisions. But tc filters, a) don’t get to see > de-natted ipv4 addresses, b) daisy chain, so all filters must be traversed. > I can’t find my original tc filter ‘de-prio bittorrent’ but it was a very > simple ‘does this destination port match?, yes skbedit to select bulk tin’ - > I wanted to do more but the daisy chaining & lack of de-natting made this > technique useless. > > Then I recently discovered act_connmark > (http://linux-ip.net/gl/tc-filters/tc-filters-node2.html) - the thinking > being I could use iptables on egress to set fwmarks to classify a connection > and have the ingress packets magically follow. This worked but still > required 3 tc filter actions to cope with 4 tins: > > $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x01 fw action > skbedit priority ${MAJOR}1 > $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x03 fw action > skbedit priority ${MAJOR}3 > $TC filter add dev $IFACE parent $MAJOR protocol ip handle 0x04 fw action > skbedit priority ${MAJOR}4 > > It also requires similar tc filters on the egress path in addition to the > iptables rules. > > Could that be improved? Yes, sort of. eBPF to the rescue-ish. I could write > an eBPF classifier action program to directly copy the fwmark to the priority > field which cake would pick up. I would have stopped there but as I’ve said > in a previous email, the eBPF needed to know (hard code) the cake instance > major numbers and there was the whole mystery tour of writing/building it. > > The other problem with the above magic tin encode into the fwmark routine is > that it ignored any good citizens that were using the correct DSCP (e.g. > dropbear). I would need to write iptables rules to classify existing DSCP > codepoints into the matching tin for fwmark. So ideally I needed the DSCP to > drive things and still act as a key into the fwmark mechanism. > > The overriding (if required) of DSCP could be done in iptables and to avoid > going through the iptables DSCP decision/mangling for every packet I could > use a flag within the fwmark to indicate the decision had previously been > made and stored for this connection. > > > The current rules are: > > # Configure iptables chain to mark packets > ipt -t mangle -N QOS_MARK_${IFACE} > > # Change DSCP of initial relevant hosts/packets - this will be picked up > by cake+ and placed in the firewall connmark > # also the DSCP is used as the tin selector. > > iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.5 -m comment > --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1 > iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.5 -m comment > --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1 > iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.10 -m comment > --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2 > iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.10 -m comment > --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2 > iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp > --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class > CS1 > iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.12 -m udp > --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class > CS1 > iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp > --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP --set-dscp-class > CS1 > iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12 -m tcp > --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP > --set-dscp-class CS1 > > iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk4 dst -j DSCP > --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset" > iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid4 dst -j DSCP > --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset" > iptables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice4 dst -j DSCP > --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset" > > ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff > -m tcp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP > --set-dscp-class CS1 > ip6tables -t mangle -A QOS_MARK_${IFACE} -p udp -s ::c/::ffff:ffff:ffff:ffff > -m udp --sport 6981 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP > --set-dscp-class CS1 > ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff > -m tcp --dport 4443 -m comment --comment "BT DSCP CS1 Bulk" -j DSCP > --set-dscp-class CS1 > ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff > -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP > --set-dscp-class CS1 > > ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Bulk6 dst -j > DSCP --set-dscp-class CS1 -m comment --comment "Bulk CS1 ipset" > ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Vid6 dst -j DSCP > --set-dscp-class CS2 -m comment --comment "Vid CS2 ipset" > ip6tables -t mangle -A QOS_MARK_${IFACE} -m set --match-set Voice6 dst -j > DSCP --set-dscp-class CS6 -m comment --comment "Voice CS6 ipset" > > # Send cake+ unmarked connections to the marking chain - Cake+ uses top > byte as the > # i've been marked & here's the dscp placeholder. > # top 6 bits are DSCP, LSB is DSCP is valid flag > ipt -t mangle -A PREROUTING -i $IFACE -m mark --mark 0x00/0x01000000 -g > QOS_MARK_${IFACE} > ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x01000000 -g > QOS_MARK_${IFACE} > > > The initial egress packet for a connection will go through the above chain > (--mark 0x00/0x01000000 -g QOS_MARK_${IFACE}) where the DSCP value is change > if required. > > Cake will see this initial packet, inspect the fwmark, and because it hasn’t > been set will both copy the dscp into the mark and set the ‘fwdscp marked’ > bit. > > Subsequent egress packets will neither go through the iptables DSCP mangle or > the cake ‘update the fwmark’ routine. Instead, cake will use the fwmark as > the tin selector. > > > The ingress path is different. First off act_connmark restores any > connection mark to the packet. Cake will inspect the fwmark for the ‘fwdscp > marked’ bit. If it is set, then the dscp coded in the firewall mark is used > for tin selection. Optionally the encoded DSCP is restored to the packet’s > diffserv, but I personally don’t use that functionality as I’m only > interested in ’tin fair’ use of the link. And that’s it. > > I’m doing 2 things. > > 1) Classifying traffic into tins on ingress based on the egress DSCP values > contained in fwmarks. > 2) Basing the fwmark contained DSCP on the initial packet of the connection, > possibly after being modified once by iptables rules. > > >> >> In particular, requirement 2 is why I'm pushing back against hard-coding >> a mask anywhere… > > I think with ‘fwmark mask’, ‘get_dscp’, ’set_dscp’, ‘get_state mask’, > ’set_state mask’ nothing *is* hard coded. > >> >> So could you maybe post your current ruleset and explain what it is you >> are trying to achieve at a high level, and why? :) > > I hope I’ve done that. > >> >> Also, you keep mentioning "must be lighter on CPU". Do you have any >> performance numbers to show the impact of your current ruleset? Would be >> easier to assess any performance impact if we have some baseline numbers >> to compare against… > > Let me see if I can quantify that in some way. > >> >> -Toke > > > Cheers, > > Kevin D-B > > 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A > > _______________________________________________ > Cake mailing list > [email protected] > https://lists.bufferbloat.net/listinfo/cake > _______________________________________________ Cake mailing list [email protected] https://lists.bufferbloat.net/listinfo/cake
