Re: [Babel-users] the routing atomic update wet paint - because *I* care
Sorry for the self-reply. That's what you get when you send e-mail at the first crack of noon. What about putting multiple netlink messages in one datagram? Will perhaps do that. :-) Remark: error handling become more tricky -- but for just 2 messages it should be fine. Do you think that the current netlink code is too simple? (I'm seeing Grégoire Henry today, he's the original author of Babel's netlink code, I'm sure he'll be surprised.) Please do not make this code more complex without experimental evidence that it's needed. On the other hand, writing a patch to do atomic updates is exactly what's needed in order to gather said experimental data. So I'll be very grateful if somebody produces said patch, on the understanding that I might decide not to apply it to mainline. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
If you work with atomic route replacement even putting ALL of them into a netlink message (or as many as you can fit in) works. Henning On Tue, Apr 7, 2015 at 12:19 PM, Matthieu Boutier bout...@pps.univ-paris-diderot.fr wrote: I agree, but I would like to know how many packets we lose. Since the remove/insert happen in quick succession, I'd expect it to be very few. … and the context switch and what little work it does alone - costs 80Mbits of forwarding, What about putting multiple netlink messages in one datagram? Will perhaps do that. :-) Remark: error handling become more tricky -- but for just 2 messages it should be fine. Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Tue, Apr 7, 2015 at 3:57 PM, Dave Taht dave.t...@gmail.com wrote: Interface index is not a problem... metric-change is. I am sorry, I do not understand, once again. If the route has the same destination and metric, you will overwrite it with an atomic update, regardless of the outgoing interface. So you can use atomic updates to switch a route to a different interface. You cannot do so to change the metric value of the route. At least that was (I think) the state with kernel 3.18/3.19 Henning ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
If you work with atomic route replacement even putting ALL of them into a netlink message (or as many as you can fit in) works. What I understand is that we can't (in general) work with atomic *next-hop* replacement (interface index and metric may change). I proposed a workaround where instead of using two distinct messages for del(r) and add(r) we use one message with del(r); add(r). Even if it's not necessarily atomic (is it?), it should be faster (only one system call, since it was what frightened Dave). Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Tue, Apr 7, 2015 at 6:35 AM, Henning Rogge hro...@gmail.com wrote: On Tue, Apr 7, 2015 at 3:19 PM, Matthieu Boutier bout...@pps.univ-paris-diderot.fr wrote: If you work with atomic route replacement even putting ALL of them into a netlink message (or as many as you can fit in) works. What I understand is that we can't (in general) work with atomic *next-hop* replacement (interface index and metric may change). Interface index is not a problem... metric-change is. I am sorry, I do not understand, once again. I proposed a workaround where instead of using two distinct messages for del(r) and add(r) we use one message with del(r); add(r). Even if it's not necessarily atomic (is it?), it should be faster (only one system call, since it was what frightened Dave). It was the changes to do RCU in the new FIB routing code that frightened me. RCU runs asynchronously. There is also the ongoing work to use these APIs to reprogram (open)switch hardware. All I can do is test, and measure, on the gear I got, on the upcoming kernels. Henning -- Dave Täht We CAN make better hardware, ourselves, beat bufferbloat, and take back control of the edge of the internet! If we work together, on making it: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
All 3 of those are GPL, AFAICT? That doesn't make for a good reference if you want a permissive license for your code. Agreed. (And agreed with Henning, we can look, we just cannot touch.) -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
Well, I tried the patches and they did not work (as expected), but I think we are closer. I will try to create some test cases using ip route to do what I want, and get back to folk when I have time. I have a ton of other reasons to want to grok the netlink code more deeply. I note that I called this thread wet paint - It was sunday, I had nothing to do but watch benchmarks run - I would like atomic updates to work. I keep poking at it as to why it doesn't. I do not know if it is a common meme outside the US to see a sign that says wet paint and to go touch it, to make sure. The ongoing FIB tree rework going into linux 4.0 and 4.1 struck me as a starting point to improve the kernel API (if needed), if we can´t find a way to make it work better. I am NOT pressuring to get it in any user space routing code presently! What I am trying to do is figure out how to fix the kernel (if needed), or do it more right in the routing daemons. (and by jove if we need kernel mods, there be an API to figure out if a newer API is available) Linux TCP has sprouted the ability to handle massive re-ordering (order, megabytes) and there has been a ton of work on things like QUIC that are also highly tolerant, as well as things like torrent, which already handle it. Babel (dont know about OLSR) finds a usable path, then tunes to a better one, but each tuning step (particularly at high rates) can lose packets, which cause rate reductions. Ideally would like to never lose packets while tuning happens. The most common case where I see an interruption in flows is when I go back from a wlan to an ethernet, under load, so being able to modify a route atomically to switch devices on the route was my end goal. The simple test I came up with earlier in the thread was not, but seemed useful. ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Mon, Apr 6, 2015 at 7:22 PM, Dave Taht dave.t...@gmail.com wrote: Well, I tried the patches and they did not work (as expected), but I think we are closer. I will try to create some test cases using ip route to do what I want, and get back to folk when I have time. I have a ton of other reasons to want to grok the netlink code more deeply. Strange... I definitely had some (ipv6) cases which ran fine with the new kernel but badly with the old one. I note that I called this thread wet paint - It was sunday, I had nothing to do but watch benchmarks run - I would like atomic updates to work. I keep poking at it as to why it doesn't. I do not know if it is a common meme outside the US to see a sign that says wet paint and to go touch it, to make sure. The ongoing FIB tree rework going into linux 4.0 and 4.1 struck me as a starting point to improve the kernel API (if needed), if we can´t find a way to make it work better. I am NOT pressuring to get it in any user space routing code presently! What I am trying to do is figure out how to fix the kernel (if needed), or do it more right in the routing daemons. (and by jove if we need kernel mods, there be an API to figure out if a newer API is available) I don't even think the API is bad... its just badly documented. And libnl(1/2/3?) was painful enough that I decided not to use it at all for olsrd2. Linux TCP has sprouted the ability to handle massive re-ordering (order, megabytes) and there has been a ton of work on things like QUIC that are also highly tolerant, as well as things like torrent, which already handle it. Babel (dont know about OLSR) finds a usable path, then tunes to a better one, but each tuning step (particularly at high rates) can lose packets, which cause rate reductions. Ideally would like to never lose packets while tuning happens. The most common case where I see an interruption in flows is when I go back from a wlan to an ethernet, under load, so being able to modify a route atomically to switch devices on the route was my end goal. The simple test I came up with earlier in the thread was not, but seemed useful. olsrd2 ist a common linkstate protocol... it collects all the information about the links and runs a dijkstra on them... then use the result to setup the new version of the routes. The whole process runs asynchronous, I don't wait for the kernel return codes, I just setup all the necessary changes and then parse the netlink feedback later. Henning ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Mon, Apr 6, 2015 at 3:03 PM, Juliusz Chroboczek j...@pps.univ-paris-diderot.fr wrote: Babel (dont know about OLSR) finds a usable path, then tunes to a better one, but each tuning step (particularly at high rates) can lose packets, which cause rate reductions. Ideally would like to never lose packets while tuning happens. I agree, but I would like to know how many packets we lose. Since the remove/insert happen in quick succession, I'd expect it to be very few. My own noted issue is that at high rates, on cheezy routers, we run out of cpu, while forwarding packets. One daemon, hostapd, wants to run at a pretty high rate, and falls behind its desired rate... and the context switch and what little work it does alone - costs 80Mbits of forwarding, currently, on the archer tplink c7 v2. (can send along a graph) You would hope that there would be no significant processing between syscalls in babel but it is hard to measure, and the easiest thing for me would merely to have been measuring the loss between atomic changes and not during the optimization phase. As it is I will try to setup some artificial benchmarks showing how much packet loss there really is when going from a wan connection to ethernet, as opposed to reordering. It might be interesting to show how windows behaves here as it as not yet got any decent mechanisms for handling reordering and slows down a lot. And I will keep touching the wet paint. A 4 phase commit seems feasible: add new route with metric 1025 del old route metric 1024 add same route metric 1024 del same route metric 1025 But dang it a single syscall should be doable, and if it isnt then the kernel APIs need to be fixed. And I do wish more of the routing folk out there tested their stuff at saturating workloads and differing RTTs such as what I do with netperf-wrapper rrul, rtt_fair, and rrul_be tests. I need to get on formalizing those tests for battlemesh. -- Juliusz -- Dave Täht We CAN make better hardware, ourselves, beat bufferbloat, and take back control of the edge of the internet! If we work together, on making it: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
Babel (dont know about OLSR) finds a usable path, then tunes to a better one, but each tuning step (particularly at high rates) can lose packets, which cause rate reductions. Ideally would like to never lose packets while tuning happens. I agree, but I would like to know how many packets we lose. Since the remove/insert happen in quick succession, I'd expect it to be very few. -- Juliusz ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Mon, Apr 6, 2015 at 5:03 PM, Dave Taht dave.t...@gmail.com wrote: On Mon, Apr 6, 2015 at 3:03 PM, Juliusz Chroboczek j...@pps.univ-paris-diderot.fr wrote: Babel (dont know about OLSR) finds a usable path, then tunes to a better one, but each tuning step (particularly at high rates) can lose packets, which cause rate reductions. Ideally would like to never lose packets while tuning happens. I agree, but I would like to know how many packets we lose. Since the remove/insert happen in quick succession, I'd expect it to be very few. My own noted issue is that at high rates, on cheezy routers, we run out of cpu, while forwarding packets. One daemon, hostapd, wants to run at a pretty high rate, and falls behind its desired rate... and the context switch and what little work it does alone - costs 80Mbits of forwarding, currently, on the archer tplink c7 v2. (can send along a graph) You would hope that there would be no significant processing between syscalls in babel but it is hard to measure, and the easiest thing for me would merely to Aha! It does help to write things down. I can merely get a timestamp between gettimestamp() del_route add_route gettimestamp() and see to what extent that goes up or jitters under load, and compare that to the relative size of the packets at the rate they are forwarding at. Groovy. have been measuring the loss between atomic changes and not during the optimization phase. As it is I will try to setup some artificial benchmarks showing how much packet loss there really is when going from a wan connection to ethernet, as opposed to reordering. It might be interesting to show how windows behaves here as it as not yet got any decent mechanisms for handling reordering and slows down a lot. And I will keep touching the wet paint. A 4 phase commit seems feasible: add new route with metric 1025 del old route metric 1024 add same route metric 1024 del same route metric 1025 But dang it a single syscall should be doable, and if it isnt then the kernel APIs need to be fixed. And I do wish more of the routing folk out there tested their stuff at saturating workloads and differing RTTs such as what I do with netperf-wrapper rrul, rtt_fair, and rrul_be tests. I need to get on formalizing those tests for battlemesh. -- Juliusz -- Dave Täht We CAN make better hardware, ourselves, beat bufferbloat, and take back control of the edge of the internet! If we work together, on making it: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking -- Dave Täht We CAN make better hardware, ourselves, beat bufferbloat, and take back control of the edge of the internet! If we work together, on making it: https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Mon, Apr 6, 2015 at 11:03 AM, Matthieu Boutier bout...@pps.univ-paris-diderot.fr wrote: I think the unique key for the route is destination, routing table and metric. The metric part is important, if you put the routing protocol path cost into the route, atomic replacement will not work. Interesting (so the previous patch is wrong). Did you know about the source part? (RTA_SRC) Is it part of the key? Phh... this is a good question... I would guess YES, otherwise the whole source-specific routing would not work. in olsrd2 (and olsrd) we just set all parameters... but I remember the original olsrd had trouble when not using a static routing metric for the routing entry. In olsrd2 I always set a constant metric value for every route, that's why I mentioned the metric. Dave, perhaps this may be better -- note this has no chance to get into babeld: -const int has_atomic_replacement = has_ipv6_subtrees; /* Dave says that if a +const int has_atomic_replacement = has_ipv6_subtrees !reflect_kernel_metric; /* Dave says that if a Henning ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
On Mon, Apr 6, 2015 at 8:43 AM, Dave Taht dave.t...@gmail.com wrote: Is there anywhere a good reference to netlink? I mostly use the source code of the ip route command. I don't think anyone ever got to the point writing good documentation. Be happy, you could be working with the multicast routing API. ;) On Mon, Mar 16, 2015 at 11:32 AM, Henning Rogge hro...@gmail.com wrote: On Mon, Mar 16, 2015 at 7:25 PM, Dave Taht dave.t...@gmail.com wrote: got code? I just used the same code I use for IPv4 (which has been working for atomic replacement for ages). I got confused when someone reported multiple IPv6 routes which I could not reproduce. http://olsr.org/git/?p=oonf.git;a=blob;f=src-plugins/subsystems/os_linux/os_routing_linux.c;h=4194fd04f8f259b686a1343cc906fc8649c6a7b6;hb=master If I understand it correctly you just need to set the routes with NLM_F_CREATE | NLM_F_REPLACE to get the atomic replacement. Well, I tried a naive approach to that for babel after looking over the olsr code and did not succeed at this level. I am curious as to whether this sequence of events works for olsr, in the first place. start it up on two machines add an ipv6 address on machine A) wait for it to show up in the routing table for B) delete the ipv6 address on machine A) See if it is unreachable, then gone on B) I can do some tests tomorrow when I am back at work. What happens with the patch below (and variants), is that babel thinks it has modified the route but it is not gone. I dont understand how the operation could work in the first place actually... don´t you have to feed both the old route and the new into the netmsg? No... I think the unique key for the route is destination, routing table and metric. The metric part is important, if you put the routing protocol path cost into the route, atomic replacement will not work. With these three the same, it should work... at least it worked for me in a current ubuntu VM. Henning ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
Is there anywhere a good reference to netlink? iproute2, libnl, kernel sources ? If I understand it correctly you just need to set the routes with NLM_F_CREATE | NLM_F_REPLACE to get the atomic replacement. -DEFINES = $(PLATFORM_DEFINES) -DVERSION=\$(VERSION)\ +DEFINES = $(PLATFORM_DEFINES) -DVERSION=\$(VERSION)\ -DIPV6_SUBTREES IPV6_SUBTREES macro is out-of-date now: we use dynamic configuration. -if(operation == ROUTE_ADD) { -buf.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; +switch(operation) { +case ROUTE_ADD: +buf.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_REPLACE; Why did you remove EXCL? and why using REPLACE here? +case ROUTE_MODIFY: +buf.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_REPLACE; +buf.nh.nlmsg_type = RTM_NEWROUTE; Seems reasonable (but I would keep EXCL). +if(operation == ROUTE_FLUSH) { + rtm-rtm_scope = RT_SCOPE_NOWHERE; +} This should be a separate patch: why this change? +if(operation == ROUTE_FLUSH) { +*(int*)RTA_DATA(rta) = 0; +} else { *(int*)RTA_DATA(rta) = ifindex; - + } Same, why this change? The main flaw is that you never use the new* arguments: no chances to work. You can try this untested patch (and note I have no clue how REPLACE should work): $ git diff kernel_netlink.c diff --git a/kernel_netlink.c b/kernel_netlink.c index eb2e801..f115003 100644 --- a/kernel_netlink.c +++ b/kernel_netlink.c @@ -929,6 +929,9 @@ kernel_route(int operation, const unsigned char *dest, unsigned short plen, struct rtattr *rta; int len = sizeof(buf.raw); int rc, ipv4, table, use_src = 0; +const int has_atomic_replacement = has_ipv6_subtrees; /* Dave says that if a +kernel is new enough to do IPV6_SUBTREES, then it can do atomic +updates */ if(!nl_setup) { fprintf(stderr,kernel_route: netlink not initialized.\n); @@ -948,6 +951,12 @@ kernel_route(int operation, const unsigned char *dest, unsigned short plen, } } +if(has_atomic_replacement operation == ROUTE_MODIFY) { +gate = newgate; +ifindex = newifindex; +metric = newmetric; +} + /* Check that the protocol family is consistent. */ if(plen = 96 v4mapped(dest)) { if(!v4mapped(gate) || @@ -962,6 +971,7 @@ kernel_route(int operation, const unsigned char *dest, unsigned short plen, } } +if(has_atomic_replacement) { if(operation == ROUTE_MODIFY) { if(newmetric == metric memcmp(newgate, gate, 16) == 0 newifindex == ifindex) @@ -988,6 +998,7 @@ kernel_route(int operation, const unsigned char *dest, unsigned short plen, } return rc; } +} ipv4 = v4mapped(gate); @@ -1019,9 +1030,16 @@ kernel_route(int operation, const unsigned char *dest, unsigned short plen, if(operation == ROUTE_ADD) { buf.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; buf.nh.nlmsg_type = RTM_NEWROUTE; -} else { +} else if(operation == ROUTE_FLUSH) { buf.nh.nlmsg_flags = NLM_F_REQUEST; buf.nh.nlmsg_type = RTM_DELROUTE; +} else if(operation == ROUTE_MODIFY) { +buf.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL +| NLM_F_REPLACE; +buf.nh.nlmsg_type = RTM_NEWROUTE; +} else { +errno = EINVAL; +return -1; } rtm = NLMSG_DATA(buf.nh); Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
Phh... this is a good question... I would guess YES, otherwise the whole source-specific routing would not work. Ok course, I was confused. -const int has_atomic_replacement = has_ipv6_subtrees; /* Dave says that if a +const int has_atomic_replacement = has_ipv6_subtrees !reflect_kernel_metric; /* Dave says that if a And this is wrong, sorry, use: const int has_atomic_replacement = has_ipv6_subtrees metric == newmetric ifindex == newifindex; Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
I think the unique key for the route is destination, routing table and metric. The metric part is important, if you put the routing protocol path cost into the route, atomic replacement will not work. Interesting (so the previous patch is wrong). Did you know about the source part? (RTA_SRC) Is it part of the key? Dave, perhaps this may be better -- note this has no chance to get into babeld: -const int has_atomic_replacement = has_ipv6_subtrees; /* Dave says that if a +const int has_atomic_replacement = has_ipv6_subtrees !reflect_kernel_metric; /* Dave says that if a Matthieu ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users
Re: [Babel-users] the routing atomic update wet paint - because *I* care
Good as a reference... not that good for copying code directly. Henning On Mon, Apr 6, 2015 at 6:58 PM, Julien Cristau jcris...@debian.org wrote: On Mon, Apr 6, 2015 at 10:56:41 +0200, Matthieu Boutier wrote: Is there anywhere a good reference to netlink? iproute2, libnl, kernel sources ? All 3 of those are GPL, AFAICT? That doesn't make for a good reference if you want a permissive license for your code. Cheers, Julien ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users ___ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users