Hi Juliusz, On Tue, Mar 07, 2023 at 01:20:28PM +0100, Juliusz Chroboczek wrote: > I have no signal processing background whatsoever; to my eyes, signal > processing is a fairly advanced for of magic. (My background is in logic > and programming languages.) > > To be honest, we hacked things until we had acceptable worst-case > behaviour.
Not to worry, there's really not that much to know that's not on Wikipedia :) > We had two networks to experiment with: Nexedi's production network > (hundreds of tunnels over the public IPv6 Internet) and a simulated > network we built ourselves which we believed represented the worst case (a > bufferbloated diamond network). We built a first prototype, which we > instrumented to log RTT samples and route flaps, and noticed three things: How was the simulation setup built? I'd be interested in trying out my implementation in the worst-case with real queues in the feedback path. See, from a signals/systems perspective what we're trying to do here looks an awful lot like a closed-loop control system[1], we measure something and feed the measurment back into the system (via the FIB) which influences the measurments (RTT). In this framework one thing you can do to analyze system stability is to "cut open" the feedback path, look at the combination of feedback path and controller (the transfer function) and apply poles and zeros analysis[2] to determine stability. Note this can be done either experimentally by measuring the frequency response (the "bode plot") or by theoretical modelling. [1]: https://en.wikipedia.org/wiki/Closed-loop_transfer_function [2]: https://en.wikipedia.org/wiki/Control_theory#Stability This is all very well studied since engineers generally don't like their physical systems shaking themselves apart :) Problem is our feedback path includes network queues, which AFAICT are highly nonlinear and due to the nature of the internet certainly not time-invariant so treating this as LTI systems is questionable. However perhaps the timescales at which the network queues operate and the network changes are far enough from our control timescale that such an analysis could still be useful. > 1. in the production network, the RTT signal is noisy (see figures 4 and 6 > of [1]); > 2. in the bufferbloated diamond network, when we switch away from > a congested route, we switch back too early, before the buffers have > had time to drain; > > 3. in the diamond network, we tend to switch routes as we oscillate around > a common value. > > Hence, the three mechanisms: > > 1. smoothing of the RTT, to makes the signal less noisy; the smoothing is > exponential just because it's easy to implement; > > 2. saturating map from RTT to metric, so that congested routes all appear > equally bad, and we don't switch back before the buffers drain; this > was stolen from [2]; > > 3. hysteresis, in order to avoid switching with too high a frequency. Right, so my plan of attack is to combat 3. with the filter in this patch instead or (depending on results) in addition to hysteresis. If it turns out hysteresis is needed I would go for a simple stateful implementation so as to not have to reintroduce the smoothing. I'm not sure we really need to do anything about 1. The route damping filter (potentially plus hysteresis) will already limit the frequency of route changes as desired, why should we want to apply additional filtering on the RTT estimates? Instead of cascading filters I would just the tweak the frequency response of that one filter. I managed to build a simple digital simulation for the rearming-timer filter in this patch and when testing with random frequency or sweep inputs and it does work as expected. There's some artifacts (lower than expected output frequency) around the cutoff frequency which is due to the rarming nature of the timer but that seems fine. Next step would be to add the smoothing filter to the simulation too and compare. I haven't figured out how to charactarize the response of the hysteresis yet so that's something I still have to look at. > Daniel, if you feel you're competent to work on that, I'd be interested in > collaborating. I don't currently have funding for Babel, but it should be > easy enough to find some. I'm not sure about "competent", TBH I haven't used any of this knowledge in years. I just learned this stuff in my electronics engineering centered secondary education. I am definetly not the right person to write this up rigorously, but I might just know enough to be dangerous (: Thanks, --Daniel
