Hi Sylvain, On Thu, Dec 03, 2015 at 12:05:02PM +0100, Sylvain Faivre wrote: > Hi, > > We just had a strange replication problem on our staging environment. > We have 2 HAproxy servers. They were running for 2 weeks now. > At the beginning, I checked that the stick tables were properly synced. > > Today, stick tables were not synced, for example : > > root@proxy1>: echo "show table front_jsessionid" | socat stdio > /usr/share/haproxy/mysocket > # table: front_jsessionid, type: string, size:10485760, used:3 > 0xd1d944: key=MkJm-rbcE3V5EJg1RmTRAw__ use=0 exp=3474723 server_id=1 > 0xd762c4: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3571578 server_id=2 > 0xea9484: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2879968 server_id=2 > > root@proxy2>: echo "show table front_jsessionid" | socat stdio > /usr/share/haproxy/mysocket > # table: front_jsessionid, type: string, size:10485760, used:5 > 0x1bc9104: key=D8EWzwrGr2UK3btCwfpweQ__ use=0 exp=3552238 server_id=1 > 0x1c3ba54: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3315450 server_id=1 > 0x1cbecb4: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2624851 server_id=2 > 0x1c49664: key=lmqFlaNhdHXYwH4M4QQjqQ__ use=0 exp=3510898 server_id=2 > 0x1cbf074: key=zpE6NHQJr~aStxzmJTVEgA__ use=0 exp=2587567 server_id=2
This is indeed not expected at all! > I tried : > echo "set table front_jsessionid key toto data.server_id 1" | socat > stdio /usr/share/haproxy/mysocket > > The new entry was not replicated on proxy2, and tcpdump showed no > outgoing traffic from proxy1 to proxy2. So that means that proxy1 was in a bad state. > Then, I reloaded the haproxy service on proxy2, and tables were synced, > including old data set in proxy1 before the reload. Interesting. Thus is means that not everything is broken, only something makes a communication stop without breaking the connection, but once the connection is renewed, it's fine again. > New data replicates all right too. > > Should this situation happen again, is there anything I can do to debug > this further ? There's not much unfortunately because I have not implemented the "show peers" action that was on my todo list. The purpose would be to dump a number of relevant information to debug the protocol, diagnose connection issues etc. > -------------------------------------------------------------- > > Here is netstat output from proxy1 while replication was broken : > > root@proxy1>: netstat -anp |grep 9421 > tcp 0 0 [proxy1]:9421 0.0.0.0:* LISTEN > 1163/haproxy > tcp 35 0 [proxy1]:9421 [proxy2]:41205 CLOSE_WAIT > - > tcp 1 0 [proxy1]:20512 [proxy2]:9421 CLOSE_WAIT > 1163/haproxy (...) This proves that a disconnection was already received but ignored by haproxy. It's possible that the other side has detected a timeout and tried to reconnect without success. > Here are relevant parts from our setup : > > peers prod > peer proxy1 [proxy1]:9421 > peer proxy2 [proxy2]:9421 > > backend front > stick on urlp(jsessionid),url_dec table front_jsessionid > stick on urlp(jsessionid,;),url_dec table front_jsessionid > stick on cookie(JSESSIONID) table front_jsessionid > stick store-response cookie(JSESSIONID) table front_jsessionid > > backend front_jsessionid > stick-table type string len 24 size 10m expire 1h peers prod OK it looks pretty fine. > We are running HAproxy 1.6.2 from the vbernat PPA on Ubuntu 12.04, with > nbproc = 1. Thanks for this precision. All I can say for now is that you clearly encountered a bug but that we don't know what this bug is. We'll have to check in the code for something which could cause this. It would be interesting to know for how long the process has been running before the issue appeared, eventhough it will not tell us for how long the connection remained alive. Thanks, Willy

