Hi Cyril,

On Fri, Mar 12, 2010 at 05:03:15PM +0100, Cyril Bonté wrote:
> Hi Willy,
> 
> Our monitoring scripts use the unix socket to get haproxy's status. Sometimes 
> they detect haproxy DOWN when it's not really the case.

Do you know how much time it takes to observe it ? I'm currently
running it on 1.4.0-3 here. It's been running for the last 10 minutes
with 2, then 3 and now 10 concurrent scripts. For the record in case
that matters, it's running on socat 1.6.0.0 :
  # /root/bin/socat -V
  socat by Gerhard Rieger - see www.dest-unreach.org
  socat version 1.6.0.0 on Oct 28 2007 21:29:34
     running on Linux version #1 Sun Jan 31 00:55:16 CET 2010, release 
2.4.37-wt3-fw, machine i686

> After some tests, it appears that 2 concurrent accesses break things : one of 
> the request receives an empty reply. I wonder if it can be a more severe 
> issue when someone uses administration commands (set weight, disable/enable 
> server,...)

I'm not worried for that because those operations are atomic, so either
they are processed or they are not, there is no risk of leaving them in
a half-processed state. However, if your scripts rely on those commands
to have been succesfully completed, then yes there's a risk that a
command is not considered (just the same as with a process being
restarted, BTW).

> At this step I can't give you much more information but I'll look at the code 
> this week-end.

OK.

> Something interesting, I could reproduce the issue with haproxy 1.4.1, 
> 1.3.23, 1.3.19 and 1.3.16 (only tested these ones) but there's no error with 
> 1.3.15.x versions (tested with 1.3.15.12, 1.3.15.11, 1.3.15.10 and 1.3.15.7).
> Maybe this can remind you some modifications of code.

Not much as most of the rework happened between 1.3 and 1.4. In fact,
some part of the work also happened between 1.3.15 and 1.3.16 but it
was the low-level I/O which is now common with TCP/HTTP. It would be
nice to try with "strace socat" instead of "socat" alone. I wonder
if it's just a scheduling issue sometimes causing socat to close its
output channel after sending the request and before receiving the
response (as we commonly have with netcat). But from my experience,
socat does not seem to abort any transfer after a unidirectional
close. So that would indicate that haproxy's stats output stops
if the input channel closes, which I don't think is the case at
all.

Regards,
Willy


Reply via email to