Re: PATCH: Dummy as IMQ replacement

jamal Fri, 30 Dec 2005 08:22:55 -0800

On Wed, 2005-28-12 at 10:46 -0500, jamal wrote:
> On Wed, 2005-28-12 at 10:39 -0500, jamal wrote:
> [..]
> > I am going to
> > send an iproute2 patch that is only documentation on usage etc.
> > 
> 
> Attached.


A new version attached. Actually all it is is a 
s/dummy/ifb/g

cheers,
jamal

Documentation on 1) generic intro to actions and their usage
b) using ifb in place of IMQ

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>
---

diff --git a/doc/actions/actions-general b/doc/actions/actions-general
new file mode 100644
index 0000000..bb2295d
--- /dev/null
+++ b/doc/actions/actions-general
@@ -0,0 +1,254 @@
+
+This documented is slightly dated but should give you idea of how things
+work.
+
+What is it?
+-----------
+
+An extension to the filtering/classification architecture of Linux Traffic
+Control. 
+Up to 2.6.8 the only action that could be "attached" to a filter was policing. 
+i.e you could say something like:
+
+-----
+tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
+127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k
+-----
+
+which implies "if a packet is seen on the ingress of the lo device with
+a source IP address of 127.0.0.1/32 we give it a classification id  of 1:1 and
+we execute a policing action which rate limits its bandwidth utilization 
+to 1.5Mbps".
+
+The new extensions allow for more than just policing actions to be added.
+They are also fully backward compatible. If you have a kernel that doesnt
+understand them, then the effect is null i.e if you have a newer tc
+but older kernel, the actions are not installed. Likewise if you
+have a newer kernel but older tc, obviously the tc will use current
+syntax which will work fine. Of course to get the required effect you need
+both newer tc and kernel. If you are reading this you have the
+right tc ;->
+
+A side effect is that we can now get stateless firewalling to work with tc. 
+Essentially this is now an alternative to iptables.
+I wont go into details of my dislike for iptables at times, but 
+scalability is one of the main issues; however, if you need stateful
+classification - use netfilter (for now).
+
+This stuff works on both ingress and egress qdiscs.
+
+Features
+--------
+
+1) new additional syntax and actions enabled. Note old syntax is still valid.
+
+Essentially this is still the same syntax as tc with a new construct
+"action". The syntax is of the form:
+tc filter add <DEVICE> parent 1:0 protocol ip prio 10 <Filter description>
+flowid 1:1 action <ACTION description>*
+
+You can have as many actions as you want (within sensible reasoning).
+
+In the past the only real action was the policer; i.e you could do something
+along the lines of:
+tc filter add dev lo parent ffff: protocol ip prio 10 u32 \
+match ip src 127.0.0.1/32 flowid 1:1 \
+police mtu 4000 rate 1500kbit burst 90k
+
+Although you can still use the same syntax, now you can say:
+
+tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
+match ip src 127.0.0.1/32 flowid 1:1 \
+action police mtu 4000 rate 1500kbit burst 90k
+
+" generic Actions" (gact) at the moment are: 
+{ drop, pass, reclassify, continue}
+(If you have others, no listed here give me a reason and we will add them)
++drop says to drop the packet
++pass says to accept it
++reclassify requests for reclassification of the packet
++continue requests for next lookup to match
+
+2)In order to take advantage of some of the targets written by the
+iptables people, a classifier can have a packet being massaged by an
+iptable target. I have only tested with mangler targets up to now.
+(infact anything that is not in the mangling table is disabled right now)
+
+In terms of hooks:
+*ingress is mapped to pre-routing hook
+*egress is mapped to post-routing hook
+I dont see much value in the other hooks, if you see it and email me good
+reasons, the addition is trivial.
+
+Example syntax for iptables targets usage becomes:
+tc filter add ..... u32 <u32 syntax> action ipt -j <iptables target syntax>
+
+example:
+tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
+match ip dst 127.0.0.8/32 flowid 1:12 \
+action ipt -j mark --set-mark 2
+
+3) A feature i call pipe
+The motivation is derived from Unix pipe mechanism but applied to packets.
+Essentially take a matching packet and pass it through 
+action1 | action2 | action3 etc.
+You could do something similar to this with the tc policer and the "continue"
+operator but this rather restricts it to just the policer and requires 
+multiple rules (and lookups, hence quiet inefficient); 
+
+as an example -- and please note that this is just an example _not_ The 
+Word Youve Been Waiting For (yes i have had problems giving examples
+which ended becoming dogma in documents and people modifying them a little
+to look clever); 
+
+i selected the metering rates to be small so that i can show better how 
+things work.
+ 
+The script below does the following: 
+- an incoming packet from 10.0.0.21 is first given a firewall mark of 1. 
+
+- It is then metered to make sure it does not exceed its allocated rate of 
+1Kbps. If it doesnt exceed rate, this is where we terminate action execution.
+
+- If it does exceed its rate, its "color" changes to a mark of 2 and it is 
+then passed through a second meter.
+
+-The second meter is shared across all flows on that device [i am suprised 
+that this seems to be not a well know feature of the policer; Bert was telling 
+me that someone was writing a qdisc just to do sharing across multiple devices;
+it must be the summer heat again; weve had someone doing that every year around
+summer  -- the key to sharing is to use a operator "index" in your policer 
+rules (example "index 20"). All your rules have to use the same index to 
+share.]
+ 
+-If the second meter is exceeded the color of the flow changes further to 3.
+
+-We then pass the packet to another meter which is shared across all devices
+in the system. If this meter is exceeded we drop the packet.
+
+Note the mark can be used further up the system to do things like policy 
+or more interesting things on the egress.
+
+------------------ cut here -------------------------------
+#
+# Add an ingress qdisc on eth0
+tc qdisc add dev eth0 ingress
+#
+#if you see an incoming packet from 10.0.0.21
+tc filter add dev eth0 parent ffff: protocol ip prio 1 \
+u32 match ip src 10.0.0.21/32 flowid 1:15 \
+#
+# first give it a mark of 1
+action ipt -j mark --set-mark 1 index 2 \
+#
+# then pass it through a policer which allows 1kbps; if the flow
+# doesnt exceed that rate, this is where we stop, if it exceeds we
+# pipe the packet to the next action
+action police rate 1kbit burst 9k pipe \
+#
+# which marks the packet fwmark as 2 and pipes
+action ipt -j mark --set-mark 2 \
+#
+# next attempt to borrow b/width from a meter
+# used across all flows incoming on eth0("index 30")
+# and if that is exceeded we pipe to the next action
+action police index 30 mtu 5000 rate 1kbit burst 10k pipe \
+# mark it as fwmark 3 if exceeded
+action ipt -j mark --set-mark 3 \
+# and then attempt to borrow from a meter used by all devices in the
+# system. Should this be exceeded, drop the packet on the floor.
+action police index 20 mtu 5000 rate 1kbit burst 90k drop
+--------------------------------- 
+
+Now lets see the actions installed with 
+"tc filter show parent ffff: dev eth0"
+
+-------- output -----------
+jroot# tc filter show parent ffff: dev eth0
+filter protocol ip pref 1 u32 
+filter protocol ip pref 1 u32 fh 800: ht divisor 1 
+filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 
1:15 
+
+   action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x1  index 2
+
+   action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb 
+
+   action order 3: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x2  index 1
+
+   action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b 
+
+   action order 5: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x3  index 3
+
+   action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b 
+
+  match 0a000015/ffffffff at 12
+-------------------------------
+
+Note the ordering of the actions is based on the order in which we entered
+them. In the future i will add explicit priorities.
+
+Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after
+you see a few lines of dots
+
+----
[EMAIL PROTECTED] hadi]# ping -f  10.0.0.22
+PING 10.0.0.22 (10.0.0.22): 56 data bytes
+....................................................................................................................................................................................................................................................................................................................................................................................................................................................
+--- 10.0.0.22 ping statistics ---
+2248 packets transmitted, 1811 packets received, 19% packet loss
+round-trip min/avg/max = 0.7/9.3/20.1 ms
+-----------------------------
+
+Now lets take a look at the stats with "tc -s filter show parent ffff: dev 
eth0"
+
+--------------
+jroot# tc -s filter show parent ffff: dev eth0
+filter protocol ip pref 1 u32 
+filter protocol ip pref 1 u32 fh 800: ht divisor 1 
+filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 
1:1
+5 
+
+   action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x1  index 2
+         Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) 
+
+   action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb 
+         Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) 
+
+   action order 3: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x2  index 1
+         Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) 
+
+   action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b 
+         Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) 
+
+   action order 5: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x3  index 3
+         Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) 
+
+   action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b 
+         Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) 
+
+  match 0a000015/ffffffff at 12
+-------------------------------
+
+Neat, eh?
+
+
+Wanna write an action module?
+------------------------------
+Its easy. Either look at the code or send me email. I will document at
+some point; will also accept documentation.
+
+TODO
+----
+
+Lotsa goodies/features coming. Requests also being accepted.
+At the moment the focus has been on getting the architecture in place.
+Expect new things in the spurious time i have to work on this
+(particularly around end of year when i have typically get time off
+from work).
+
diff --git a/doc/actions/ifb-README b/doc/actions/dummy-README
new file mode 100644
index 0000000..3ef9f21
--- /dev/null
+++ b/doc/actions/ifb-README
@@ -0,0 +1,155 @@
+
+Advantage over current IMQ; cleaner in particular in in SMP;
+with a _lot_ less code.
+Old Dummy device functionality is preserved while new one only
+kicks in if you use actions.
+
+IMQ USES
+--------
+As far as i know the reasons listed below is why people use IMQ. 
+It would be nice to know of anything else that i missed.
+
+1) qdiscs/policies that are per device as opposed to system wide.
+IMQ allows for sharing.
+
+2) Allows for queueing incoming traffic for shaping instead of
+dropping. I am not aware of any study that shows policing is 
+worse than shaping in achieving the end goal of rate control.
+I would be interested if anyone is experimenting.
+
+3) Very interesting use: if you are serving p2p you may wanna give 
+preference to your own localy originated traffic (when responses come back)
+vs someone using your system to do bittorent. So QoSing based on state
+comes in as the solution. What people did to achive this was stick
+the IMQ somewhere prelocal hook.
+I think this is a pretty neat feature to have in Linux in general.
+(i.e not just for IMQ).
+But i wont go back to putting netfilter hooks in the device to satisfy
+this.  I also dont think its worth it hacking ifb some more to be 
+aware of say L3 info and play ip rule tricks to achieve this.
+--> Instead the plan is to have a contrack related action. This action will
+selectively either query/create contrack state on incoming packets. 
+Packets could then be redirected to ifb based on what happens -> eg 
+on incoming packets; if we find they are of known state we could send to 
+a different queue than one which didnt have existing state. This
+all however is dependent on whatever rules the admin enters.
+
+At the moment this function does not exist yet. I have decided instead
+of sitting on the patch to release it and then if theres pressure i will
+add this feature.
+
+What you can do with ifb currently with actions
+--------------------------------------------------
+
+Lets say you are policing packets from alias 192.168.200.200/32
+you dont want those to exceed 100kbps going out.
+
+tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \
+match ip src 192.168.200.200/32 flowid 1:2 \
+action police rate 100kbit burst 90k drop
+
+If you run tcpdump on eth0 you will see all packets going out
+with src 192.168.200.200/32 dropped or not
+Extend the rule a little to see only the ones that made it out:
+
+tc filter add dev eth0 parent 1: protocol ip prio 10 u32 \
+match ip src 192.168.200.200/32 flowid 1:2 \
+action police rate 10kbit burst 90k drop \
+action mirred egress mirror dev ifb0 
+
+Now fire tcpdump on ifb0 to see only those packets ..
+tcpdump -n -i ifb0 -x -e -t 
+
+Essentially a good debugging/logging interface.
+
+If you replace mirror with redirect, those packets will be
+blackholed and will never make it out. This redirect behavior
+changes with new patch (but not the mirror). 
+
+What you can do with the patch to provide functionality
+that most people use IMQ for below:
+
+--------
+export TC="/sbin/tc"
+
+$TC qdisc add dev ifb0 root handle 1: prio 
+$TC qdisc add dev ifb0 parent 1:1 handle 10: sfq
+$TC qdisc add dev ifb0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 
3000
+$TC qdisc add dev ifb0 parent 1:3 handle 30: sfq                               
 
+$TC filter add dev ifb0 protocol ip pref 1 parent 1: handle 1 fw classid 1:1
+$TC filter add dev ifb0 protocol ip pref 2 parent 1: handle 2 fw classid 1:2
+
+ifconfig ifb0 up
+
+$TC qdisc add dev eth0 ingress
+
+# redirect all IP packets arriving in eth0 to ifb0 
+# use mark 1 --> puts them onto class 1:1
+$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
+match u32 0 0 flowid 1:1 \
+action ipt -j MARK --set-mark 1 \
+action mirred egress redirect dev ifb0
+
+--------
+
+
+Run A Little test:
+
+from another machine ping so that you have packets going into the box:
+-----
[EMAIL PROTECTED] action-tests]# ping 10.22
+PING 10.22 (10.0.0.22): 56 data bytes
+64 bytes from 10.0.0.22: icmp_seq=0 ttl=64 time=2.8 ms
+64 bytes from 10.0.0.22: icmp_seq=1 ttl=64 time=0.6 ms
+64 bytes from 10.0.0.22: icmp_seq=2 ttl=64 time=0.6 ms
+
+--- 10.22 ping statistics ---
+3 packets transmitted, 3 packets received, 0% packet loss
+round-trip min/avg/max = 0.6/1.3/2.8 ms
[EMAIL PROTECTED] action-tests]# 
+-----
+Now look at some stats:
+
+---
[EMAIL PROTECTED]:~# $TC -s filter show parent ffff: dev eth0
+filter protocol ip pref 10 u32 
+filter protocol ip pref 10 u32 fh 800: ht divisor 1 
+filter protocol ip pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 
1:1 
+  match 00000000/00000000 at 0
+        action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING 
+        target MARK set 0x1  
+        index 1 ref 1 bind 1 installed 4195sec  used 27sec 
+         Sent 252 bytes 3 pkts (dropped 0, overlimits 0) 
+
+        action order 2: mirred (Egress Redirect to device ifb0) stolen
+        index 1 ref 1 bind 1 installed 165 sec used 27 sec
+         Sent 252 bytes 3 pkts (dropped 0, overlimits 0) 
+
[EMAIL PROTECTED]:~# $TC -s qdisc
+qdisc sfq 30: dev ifb0 limit 128p quantum 1514b 
+ Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 
+qdisc tbf 20: dev ifb0 rate 20Kbit burst 1575b lat 2147.5s 
+ Sent 210 bytes 3 pkts (dropped 0, overlimits 0) 
+qdisc sfq 10: dev ifb0 limit 128p quantum 1514b 
+ Sent 294 bytes 3 pkts (dropped 0, overlimits 0) 
+qdisc prio 1: dev ifb0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
+ Sent 504 bytes 6 pkts (dropped 0, overlimits 0) 
+qdisc ingress ffff: dev eth0 ---------------- 
+ Sent 308 bytes 5 pkts (dropped 0, overlimits 0) 
+
[EMAIL PROTECTED]:~# ifconfig ifb0
+ifb0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00  
+          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
+          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
+          RX packets:6 errors:0 dropped:3 overruns:0 frame:0
+          TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
+          collisions:0 txqueuelen:32 
+          RX bytes:504 (504.0 b)  TX bytes:252 (252.0 b)
+-----
+
+Dummy continues to behave like it always did.
+You send it any packet not originating from the actions it will drop them.
+[In this case the three dropped packets were ipv6 ndisc].
+
+cheers,
+jamal
diff --git a/doc/actions/mirred-usage b/doc/actions/mirred-usage
index 3e135a0..aa942e5 100644
--- a/doc/actions/mirred-usage
+++ b/doc/actions/mirred-usage
@@ -66,6 +66,6 @@ action mirred egress mirror dev eth1
 ---
 
 A more interesting example is when you mirror flows to a ifb device
-so you could tcpdump them (ifb by defaults drops all devices it sees).
+so you could tcpdump them (ifb by defaults drops all packets it sees).
 This is a very useful debug feature.

Re: PATCH: Dummy as IMQ replacement

Reply via email to