This ended up being a deeply philosophical digression into routing behaviors that I think I'll have to blog about, with pictures, to fully describe.
What I want is a world of ubiquitous always-on connectivity[1] - where you can be at your desk with 20 connections nailed up, listening to an audio stream, doing a big upload or download - then pull your box out of the ethernet dock, go to wifi, move to another room, plug in again, and everything survive and take advantage of the better link after a few seconds. 8+ years ago, with ahcp and babel, and a network configured to use that with a single static ip address on both the ethernet and wifi, I could do that. My own networks were setup that way, anyway... I did it all the time. It was wonderful. I never had to think about it. It was massively disconcerting to attempt to move back into the "regular" world where wifi and ethernet were treated as distinct, where taking an interface offline lost its address, where taking a new /64 was considered mandatory, and no host changes allowed, as part of homenet. I'd switch to how things were done "in the real world" - get up from my desk - despite having both the wifi and ethernet online at the same time - and all my connections would drop. Agh.... Sure, new protocols like mosh-multipath, quic, etc, recover from a move, but they don't... that wasn't the case I was testing, I was testing multiple routes through the middle of the network, where I'd hope for better behavior while there is load. So what I get currently from trying to do failover in the middle of the network right now, using the -l option and the supplied patch, is that usually the failover is not quite quick enough, and 1 or more connections fails like this: (using the flent rrul test here) Program output: netperf: send_omni: recv_data failed: No route to host netperf: send_omni: recv_data failed: No route to host Interim result: 33.47 10^6bits/s over 0.200 seconds ending at 1461547666.713 Interim result: 22.99 10^6bits/s over 0.201 seconds ending at 1461547666.914 Interim result: I've harped on a need for atomic updates, but I still think that a userspace routing daemon simply can't react fast enough to a change in an ethernet routing table to prevent no-route messages being sent to one or more flows on a busy link when it goes down. So I got a mildly better result by installing a static backup link, like this: 172.26.64.0/24 via 172.26.64.1 dev usbnet0 proto babel onlink 172.26.64.0/24 dev usbnet0 proto kernel scope link src 172.26.64.231 metric 100 172.26.64.0/24 via 172.26.16.5 dev eth0 metric 200 for which the traffic survives the ifconfig usbnet0 down event better. I imagine that putting in the "3 best routes" into the kernel RIB is not something most meshy daemons do? A newer problem that I haven't thunk much about before was that babel aims for a stable route, so if I have 3 routes - one stable, but lousy, and both the better routes flap twice in under 60 seconds or so, we end up choosing the stablest route, sometimes for a very long time. I still see many seconds before stuff recovers in some instances. [1] http://frankston.com/public/?n=IAC.UAC _______________________________________________ Babel-users mailing list Babel-users@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/babel-users