Willy, Am 15.01.19 um 15:32 schrieb Willy Tarreau: > Got it! I thought the problem was local to a process and that we > replicated bad data, but in fact not, it's a distributed race. In > this case there is no other short-term solution, and the drift has > no reason to significantly accumulate over time. The only long-term > solution I'd be seeing to work around this specific pattern would be > to keep such values as differential pairs : > - count and synchronize the number of ++ > - count and synchronize the number of -- > In this case the real value is the difference between the two. But > it's a bit overkill and is still prone to other races when connections > appear in parallel on the two peers. Then at this point better use an > external aggregator.
Ideally the peers would exchange their local values, only. These are obviously correct (otherwise it would be broken even without peers). Then the instances aggregate the values (e.g. sum up all the values for the number of connections) themselves to get the correct value. It's a good example for a "long term" issue for the soon-to-come™ issue tracker ;) > > OK I'm merging Tim's patch now. > It was not meant for actually applying (that's why it was missing out on backport information as well), because fixing just one counter is fixing the symptoms. I suspect the other stick table values are affected from a possible underflow as well. But if you are fine with the patch it's fine, I guess. A real solution definitely requires breaking compatibility with the current peer protocol. Best regards Tim Düsterhus

