Le Vendredi 12 Mars 2010 20:17:14, Willy Tarreau a écrit :
> Hi Cyril,
>
> On Fri, Mar 12, 2010 at 05:03:15PM +0100, Cyril Bonté wrote:
> > Hi Willy,
> >
> > Our monitoring scripts use the unix socket to get haproxy's status.
> > Sometimes they detect haproxy DOWN when it's not really the case.
>
> Do you know how much time it takes to observe it ? I'm currently
It's random but generally I don't wait more than 10 seconds (less than 1500
loops at home).
> running it on 1.4.0-3 here. It's been running for the last 10 minutes
> with 2, then 3 and now 10 concurrent scripts. For the record in case
> that matters, it's running on socat 1.6.0.0 :
> # /root/bin/socat -V
> socat by Gerhard Rieger - see www.dest-unreach.org
> socat version 1.6.0.0 on Oct 28 2007 21:29:34
At work it should be version 1.6.0.1 (debian lenny package)
Tonight, my tests are done with the version 1.7.1.2.
> running on Linux version #1 Sun Jan 31 00:55:16 CET 2010, release
> 2.4.37-wt3-fw, machine i686
I don't think this makes big differences but my tests were done with
2.6.{18,24,31,33} kernels.
> Not much as most of the rework happened between 1.3 and 1.4. In fact,
> some part of the work also happened between 1.3.15 and 1.3.16 but it
> was the low-level I/O which is now common with TCP/HTTP. It would be
> nice to try with "strace socat" instead of "socat" alone. I wonder
> if it's just a scheduling issue sometimes causing socat to close its
> output channel after sending the request and before receiving the
> response (as we commonly have with netcat).
This might be the case but then it's strange that it doesn't happen with non
concurrent accesses.
Working trace :
...
stat("/tmp/haproxy.socket", {st_mode=S_IFSOCK|0755, st_size=0, ...}) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
connect(3, {sa_family=AF_FILE, path="/tmp/haproxy.socket"}, 21) = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff6290e590) = -1 EINVAL (Invalid
argument)
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff6290e590) = -1 EINVAL (Invalid
argument)
select(4, [0 3], [1 3], [], NULL) = 3 (in [0], out [1 3])
read(0, "show info\n", 8192) = 10
write(3, "show info\n", 10) = 10
select(4, [0 3], [3], [], NULL) = 3 (in [0 3], out [3])
read(3, "Name: HAProxy\nVersion: 1.3.16\nRe"..., 8192) = 255
write(1, "Name: HAProxy\nVersion: 1.3.16\nRe"..., 255) = 255
read(0, "", 8192) = 0
shutdown(3, 1 /* send */) = 0
select(4, [3], [1], [], {0, 500000}) = 2 (in [3], out [1], left {0, 499998})
read(3, "", 8192) = 0
shutdown(3, 1 /* send */) = 0
shutdown(3, 2 /* send and receive */) = 0
exit_group(0) = ?
Non working one :
stat("/tmp/haproxy.socket", {st_mode=S_IFSOCK|0755, st_size=0, ...}) = 0
socket(PF_FILE, SOCK_STREAM, 0) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
connect(3, {sa_family=AF_FILE, path="/tmp/haproxy.socket"}, 21) = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff5716e590) = -1 EINVAL (Invalid
argument)
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff5716e590) = -1 EINVAL (Invalid
argument)
select(4, [0 3], [1 3], [], NULL) = 3 (in [0], out [1 3])
read(0, "show info\n", 8192) = 10
write(3, "show info\n", 10) = 10
select(4, [0 3], [3], [], NULL) = 2 (in [0], out [3])
read(0, "", 8192) = 0
shutdown(3, 1 /* send */) = 0
select(4, [3], [], [], {0, 500000}) = 1 (in [3], left {0, 499998})
read(3, "", 8192) = 0
shutdown(3, 1 /* send */) = 0
shutdown(3, 2 /* send and receive */) = 0
exit_group(0) = ?
> But from my experience,
> socat does not seem to abort any transfer after a unidirectional
> close. So that would indicate that haproxy's stats output stops
> if the input channel closes, which I don't think is the case at
> all.
>
> Regards,
> Willy
>
>
--
Cyril Bonté