On 10/27/08 17:09, Darren Reed wrote:
On 10/27/08 09:31, Michael Schuster wrote:
Darren,

thx for your comments. some answers/reflection below:

On 10/26/08 19:43, Darren Reed wrote:
..
Health Checks.
==============
This design has a single daemon, with a single thread,
that polls multiple servers to update a single pool of
data in the kernel.

If we assume that the in-kernel handling of requests
from the daemon enforces MP-safety, why not run multiple
daemons?

actually, it's the daemon that will serialise access to the kernel.

If your kernel interface aren't MP-safe then you need to fix
your interfaces so that they are. It is not acceptable to
require the daemon to ensure the integrity of data inside
the kernel.


i.e. run an ilbd per back end server (or at least a
thread per back-end server.) You might still need a
single daemon to act as the manager? *shrug*

this sounds like you're replacing the load of repeatedly starting health check processes by having as many processes sitting around idly a lot of the time.

Yup.

Darren,

Assuming I have 100 back-end servers, your suggestions requires 100 ilbd instances . What exactly would be the benefit of having this design that would justify the added complexity?

BTW, for Phase 1 we have decided to have the external health checks be implemented in the same way as the ping and tcp/udp probes. Depending on what gets most use by admin, we may change the implementation of ping and tcp/udp probe in later phase.

Since in the current design ilbd maintains quite a bit of state, one would indeed have to coordinate all the information to be able to get the "complete" picture again, so the added benefit seem a little elusive to me here.

What state is there to manage that needs to be shared?
And if there is such state, why isn't it talked about
in the design doc.?

So far as health checks go, the ilbd is responsible for:
- ensuring that all of the destinations are periodically
  probed and
- ensuring that the list of in-kernel destinations matches
  those that are successfully responding to probes.

One way to do that is to have a big program that polls each
one in turn, with lots of complexity to ensure that nobody
causes the program to pause too long and everyone gets
serviced in turn, all inside one big loop. There is lots
of state held but it is all still per-destination.


This should also remove the ilbd main-loop from being a
critical section of code, where slow down from dealing
with one external server can impact all of the others.
Instead, scheduling of work is left up to the kernel to
schedule threads/processes, depending on who's busy or
blocked, etc.

anything that we expect to block (health check) is farmed out to processes, so that happens anyway.

The design document does not reflect this at all.

Darren


_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to