Hi Mark, On Thu, Feb 23, 2012 at 03:48:08PM +0000, Mark Brooks wrote: > Thanks Willy, We have re-tested the replication across haproxy > reload/restart and it appears it was working as you suggested. So > apologies there.
You don't have to apologize, you might have encountered a real bug which only appears once in a while. As I often say, reporting uncertain bugs is better than nothing, at least it can suggest other people to report a "me too". Also, at Exceliance during some preliminary native-SSL tests, one of our engineers noticed a bug which could possibly affect peers replication after some error scenarios occur. It looks like if some errors happen on the connection after a full replication, next connections will not necessarily restart replication. It might be what you've observed. The fix has been pushed into the master tree and I'm planning on a -dev8 next week as enough fixes are stacked there. > We have seen that when restarting or reloading the table syncs between > 2 processes on the same box and also when it syncs to a remote peer > that the persistence timeout counter is reset to the maximum value and > not carried with it. > Is it possible to request the persistence timeout entries counters > sync across this restart/reload? No, timers are not exchanged, only the server ID. A number of other things would need to be synced (eg: counters, etc...) but that's still quite difficult to do, so for now sessions are refreshed upon synchronization just as if there was activity on them. > It has however raised another question - How best to clear the tables > on all appliances at the same time. I unfortunately have no solution to this problem right now and I know for sure that it can be annoying sometimes. It's not even haproxy-specific, it's a general problem of how to make an information disappear from a global system when it's replicated in real time and you can only destroy it on a single node at a time. Some solutions would possibly involve sending deletion orders to other nodes or just updating their expiration timer, I don't know for now. I think that it will be easier or at least less critical when the expiration timers are shared! (...) > The only thing we have been able to come up with so far is to put each > of the backend servers in maintenance mode first so they stop > accepting new connections then clear the tables then bring them back > on-line again. I think you could proceed differently : break the replication between the nodes, clear all tables then reopen replication. At least it would not block user access nor traffic. Regards, Willy

