Re: [PATCH] [RFC] Babel: Implement route daming with fixed delay

dxld Tue, 07 Mar 2023 09:08:17 -0800

Hi Juliusz,

On Tue, Mar 07, 2023 at 01:20:28PM +0100, Juliusz Chroboczek wrote:
> I have no signal processing background whatsoever; to my eyes, signal
> processing is a fairly advanced for of magic.  (My background is in logic
> and programming languages.)
>
> To be honest, we hacked things until we had acceptable worst-case
> behaviour.


Not to worry, there's really not that much to know that's not on Wikipedia :)

> We had two networks to experiment with: Nexedi's production network
> (hundreds of tunnels over the public IPv6 Internet) and a simulated
> network we built ourselves which we believed represented the worst case (a
> bufferbloated diamond network).  We built a first prototype, which we
> instrumented to log RTT samples and route flaps, and noticed three things:

How was the simulation setup built? I'd be interested in trying out my
implementation in the worst-case with real queues in the feedback path.

See, from a signals/systems perspective what we're trying to do here looks
an awful lot like a closed-loop control system[1], we measure something and
feed the measurment back into the system (via the FIB) which influences the
measurments (RTT).

In this framework one thing you can do to analyze system stability is to
"cut open" the feedback path, look at the combination of feedback path and
controller (the transfer function) and apply poles and zeros analysis[2] to
determine stability. Note this can be done either experimentally by
measuring the frequency response (the "bode plot") or by theoretical
modelling.

[1]: https://en.wikipedia.org/wiki/Closed-loop_transfer_function
[2]: https://en.wikipedia.org/wiki/Control_theory#Stability

This is all very well studied since engineers generally don't like their
physical systems shaking themselves apart :)

Problem is our feedback path includes network queues, which AFAICT are
highly nonlinear and due to the nature of the internet certainly not
time-invariant so treating this as LTI systems is questionable.

However perhaps the timescales at which the network queues operate and the
network changes are far enough from our control timescale that such an
analysis could still be useful.

> 1. in the production network, the RTT signal is noisy (see figures 4 and 6
>    of [1]);
> 2. in the bufferbloated diamond network, when we switch away from
>    a congested route, we switch back too early, before the buffers have
>    had time to drain;
> 
> 3. in the diamond network, we tend to switch routes as we oscillate around
>    a common value.
>
> Hence, the three mechanisms:
> 
> 1. smoothing of the RTT, to makes the signal less noisy; the smoothing is
>    exponential just because it's easy to implement;
> 
> 2. saturating map from RTT to metric, so that congested routes all appear
>    equally bad, and we don't switch back before the buffers drain; this
>    was stolen from [2];
> 
> 3. hysteresis, in order to avoid switching with too high a frequency.

Right, so my plan of attack is to combat 3. with the filter in this patch
instead or (depending on results) in addition to hysteresis.
If it turns out hysteresis is needed I would go for a simple stateful
implementation so as to not have to reintroduce the smoothing.

I'm not sure we really need to do anything about 1. The route damping
filter (potentially plus hysteresis) will already limit the frequency of
route changes as desired, why should we want to apply additional filtering
on the RTT estimates? Instead of cascading filters I would just the tweak
the frequency response of that one filter.

I managed to build a simple digital simulation for the rearming-timer
filter in this patch and when testing with random frequency or sweep inputs
and it does work as expected. There's some artifacts (lower than expected
output frequency) around the cutoff frequency which is due to the rarming
nature of the timer but that seems fine.

Next step would be to add the smoothing filter to the simulation too and
compare. I haven't figured out how to charactarize the response of the
hysteresis yet so that's something I still have to look at.

> Daniel, if you feel you're competent to work on that, I'd be interested in
> collaborating.  I don't currently have funding for Babel, but it should be
> easy enough to find some.

I'm not sure about "competent", TBH I haven't used any of this knowledge in
years. I just learned this stuff in my electronics engineering centered
secondary education. I am definetly not the right person to write this up
rigorously, but I might just know enough to be dangerous (:

Thanks,
--Daniel

Re: [PATCH] [RFC] Babel: Implement route daming with fixed delay

Reply via email to