Re: Testing Stick table replication with snapshot 20120222

Mark Brooks Thu, 23 Feb 2012 07:48:31 -0800

On 23 February 2012 07:20, Willy Tarreau <[email protected]> wrote:
> Hi,
>
> On Wed, Feb 22, 2012 at 05:54:48PM +0100, Baptiste wrote:
>> Hey,
>>
>> Why not using "balance source" instead of using stick tables to do ip
>> source affinity?
>
> The main difference between "balance source" and "stick on src" is that
> with the former, when you lose a server, all clients are redistributed
> while in the second only the clients attached to the failed server are
> redistributed. Also when the failed server comes back, clients are moved
> again with "balance source". "balance source" + "hash-type consistent"
> at least fixes the first issue but the second one remains. I'm not fond of
> stick tables at all, but I must admit they address real world issues :-)
>
>> Note that the behavior you're observing is not a bug, it's by design
>> :) There is no master on the table.
>
> Upon restart, there is a special state where the new peer connects to
> other ones and asks them to dump all of their tables contents. So this
> issue should not happen at all or it's a bug. We already observed this
> behaviour during the development of the feature, but it's never been
> observed since it was released. Maybe we recently broke something. Mark,
> what version are you using ? Do you have any patches applied ?
>
> Regards,
> Willy
>


Thanks Willy, We have re-tested the replication across haproxy
reload/restart and it appears it was working as you suggested. So
apologies there.

We have seen that when restarting or reloading the table syncs between
2 processes on the same box and also when it syncs to a remote peer
that the persistence timeout counter is reset to the maximum value and
not carried with it.

Is it possible to request the persistence timeout entries counters
sync across this restart/reload?

It has however raised another question - How best to clear the tables
on all appliances at the same time.

Say the devices were out of sync or there was a problem somewhere
within it users getting directed to the wrong place resulting in a
requirement to clear the tables and start again.

We could use the clear table socat command but that only clears on one
device at a time, so you could end up with the state where you clear
instance1 then clear instance2 but during the time between clearing
instance1 and clearing instance2, some new users connected to
instance1. So when you clear instance2 those entries will not be
synchronised. This would be particularly obvious if you were using
something with a long connection time for example an RDP session. So
say instance 1 were to fail again, you would not have all entries for
instance1 in instance2's persistence table.

The only thing we have been able to come up with so far is to put each
of the backend servers in maintenance mode first so they stop
accepting new connections then clear the tables then bring them back
on-line again.

Do you have a neater method of clearing the tables without having to
block users access?

Mark

Re: Testing Stick table replication with snapshot 20120222

Reply via email to