On Thu, Dec 03, 2015 at 03:56:45PM +0100, Sylvain Faivre wrote:
> According to our logs, both HAproxy processes were started at Nov 24 
> 11:25:xx and application errors caused by lack of session replication 
> started happenning at Dec  1 17:05:35
> So that's a bit more than 1 week later.
> 
> We'll keep looking for replication errors in the next days, and 
> specially in 1 week, and report back.

OK thank you. I hope it will not affect your activity too much :-/
That just makes me think that having an option to shut down a peers
session on the CLI would help.

Hmmm in fact there's a way to observe the peers session, since now
they're using regular sessions. If you issue "show sess" on the CLI
you'll see some sessions with "fe=<name-of-the-local-peer>" (yes I
know the name is poorly chose and may cause confusion) :

> show sess 
0x7ce468: proto=unix_stream src=unix:1 fe=GLOBAL be=<NONE> srv=<none> ts=02 
age=2m11s calls=1 rq[f=c08202h,i=0,an=00h,rx=,wx=,ax=] 
rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,ch,fd=9,ex=] 
s1=[7,4018h,fd=-1,ex=] exp=23h57m
0x7cdd98: proto=tcpv4 src=127.0.0.1:38831 fe=myhost1 be=<NONE> srv=<none> ts=0a 
age=0s calls=1 rq[f=c08200h,i=0,an=00h,rx=4s,wx=,ax=] 
rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,48h,fd=8,ex=] 
s1=[7,4058h,fd=-1,ex=] exp=4s

Here it's the second line (my local peer is called "myhost1"). The first line
is my CLI session.

Then "show sess <session_id>" will show more details (the ID is the pointer
at the beginning of the line) :

> show sess 0x7cdd98
0x7cdd98: [03/Dec/2015:16:01:58.200647] id=81 proto=tcpv4 source=127.0.0.1:38831
  flags=0x80, conn_retries=3, srv_conn=(nil), pend_pos=(nil)
  frontend=myhost1 (id=4294967295 mode=tcp), listener=? (id=0) 
addr=127.0.0.1:8521
  backend=<NONE> (id=-1 mode=-)
  server=<NONE> (id=-1)
  task=0x7cab40 (state=0x0a nice=0 calls=1 exp=0s age=4s)
  si[0]=0x7cdf90 (state=EST flags=0x48 endp0=CONN:0x7ca940 exp=<NEVER>, 
et=0x000)
  si[1]=0x7cdfb0 (state=EST flags=0x4058 endp1=APPCTX:0x7ce100 exp=<NEVER>, 
et=0x000)
  co0=0x7ca940 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x7cb688
      flags=0x0020b306 fd=8 fd.state=25 fd.cache=0 updt=0
  app1=0x7ce100 st0=7 st1=0 st2=0 applet=<PEER>
  req=0x7cdda8 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=37)
      an_exp=<NEVER> rex=0s wex=<NEVER>
      buf=0x77ba20 data=0x77ba34 o=0 p=0 req.next=0 i=0 size=0
  res=0x7cdde8 (f=0x80048202 an=0x0 pipe=0 tofwd=-1 total=4)
      an_exp=<NEVER> rex=<NEVER> wex=<NEVER>
      buf=0x77ba20 data=0x77ba34 o=0 p=0 rsp.next=0 i=0 size=0

You can even shut it down :
> shutdown session 0x7cdd98

Here I have no traffic so the session expires every 5 or 10 seconds and is
renewed. In your case it should not happen obviously.

So such a dump on both nodes before reloading could significantly help if
you face the issue again. If it's too complicated to pick only this session,
you can issue "show sess all", send the huge output to a file and then dig
for "<PEER>" in this file. It might be easier for an emergency case.

Best regards,
Willy


Reply via email to