Re: Truncated health check response from real servers

2010-07-12 Thread Anze
Hi Willy,

On Saturday 10 July 2010, Willy Tarreau wrote:
  Actually, I am not that much interested in the patch per-se, I would just
  like to check the body of the check-response and compare it to a string
  for more reliable checks.
 
 It's what the patch does.

Hmmm, I thought it was bigger... 

  I have included my original mail (and reasoning) on the
  subject below. It's not on top of my priority list right now either,
  still... sooner or later we might get bitten by it.
 
  I'd love to help out coding, but sockets and C++ are not my strong points
  (though I am comfortable in C).
 
 So you're welcome, haproxy is plain C, and I don't understand C++ either
  ;-)

How did I miss that? :-D

Ok, I'll take a look at it then. Thanks for updating the patch and for 
pointers on where to look for problematic sections. 

I can't promise it will be soon though, we have a lot of work right now... but 
I'll do my best.

Thanks,

Anze





Re: Truncated health check response from real servers

2010-07-12 Thread Willy Tarreau
Hi Anze,

On Mon, Jul 12, 2010 at 09:28:09AM +0200, Anze wrote:
 Hi Willy,
 
 On Saturday 10 July 2010, Willy Tarreau wrote:
   Actually, I am not that much interested in the patch per-se, I would just
   like to check the body of the check-response and compare it to a string
   for more reliable checks.
  
  It's what the patch does.
 
 Hmmm, I thought it was bigger... 

It has been quite simplified with Nick's work on buffering server checks.
I'd say that I too recalled a bigger one ;-)

 Ok, I'll take a look at it then. Thanks for updating the patch and for 
 pointers on where to look for problematic sections. 
 
 I can't promise it will be soon though, we have a lot of work right now... 
 but 
 I'll do my best.

You're welcome. Don't hurry anyway, this patch has been lying around for more
than 1 year now, there's no emergency anymore !

Cheers,
Willy




Re: Truncated health check response from real servers

2010-07-10 Thread Willy Tarreau
Hi Anze,

On Fri, Jul 09, 2010 at 01:48:12AM +0200, Anze wrote:
 
  Indeed it never made it in. Each time I look at it, I realize that
  cleaning it up and fixing it will take more time than what I have
  available to work on it :-/
 
 I can understand that completely. :)
 
 Actually, I am not that much interested in the patch per-se, I would just 
 like 
 to check the body of the check-response and compare it to a string for more 
 reliable checks.

It's what the patch does.

 I have included my original mail (and reasoning) on the 
 subject below. It's not on top of my priority list right now either, still... 
 sooner or later we might get bitten by it. 
 
 I'd love to help out coding, but sockets and C++ are not my strong points 
 (though I am comfortable in C).

So you're welcome, haproxy is plain C, and I don't understand C++ either ;-)

 I can help with testing though if needed. Let 
 me know if it helps and if/when you find time for this. :)
 
 Or, if you wish, I can take Nick's patch and extract just the parts with 
 checking the response body? Can you point out some of the errors so I can 
 avoid them?

I've rediffed the patch against 1.4.8, you can find it here :

http://haproxy.1wt.eu/download/patches/http-ecv-checks-1.4.8.diff

Most of the things I'm particularly concerned are in httpchk_expect() :

  - multiple calls to strstr(), strncmp() or regexec() on a non-null
terminated string. It only works if the buffer contained a null
from a previous use, otherwise it will happily overflow the buffer !
Either we find how to properly bound the functions, though I'm not
sure we can for regexec(), or we force a '\0' at the end of the data
received from the server, possibly truncating the response by one
byte if the buffer is full.

  - a calloc() call inside the check code ! We don't want to perform
dynamic memory allocation outside of the pools once the code is
running. Also, the return is not checked. The proper way to replace
this is to use trash[] as a temporary variable, it's designed for
this.

  - working mode checked with strcmp() ! This is particularly slow and
complicated, I'd rather have enums that are correctly set when the
config is parsed, and make use of them normally with a swich/case
of even an if statement.

  - one or a few inline variable declarations which will break build
on older compilers still in use in some embedded systems (easily
fixed though).

As you see, there's not even any socket work on this, so if you feel
comfortable with this, feel free to contribute, I think this is the
patch that received the most different contributors' work.

Once it's OK, I'm willing to merge it into next 1.4 stable because I
know that several people would like to use it, and it doesn't affect
other parts of code, so there's no risk of regression if it breaks.

Thanks,
Willy




Re: Truncated health check response from real servers

2010-07-08 Thread Anze

 Indeed it never made it in. Each time I look at it, I realize that
 cleaning it up and fixing it will take more time than what I have
 available to work on it :-/

I can understand that completely. :)

Actually, I am not that much interested in the patch per-se, I would just like 
to check the body of the check-response and compare it to a string for more 
reliable checks. I have included my original mail (and reasoning) on the 
subject below. It's not on top of my priority list right now either, still... 
sooner or later we might get bitten by it. 

I'd love to help out coding, but sockets and C++ are not my strong points 
(though I am comfortable in C). I can help with testing though if needed. Let 
me know if it helps and if/when you find time for this. :)

Or, if you wish, I can take Nick's patch and extract just the parts with 
checking the response body? Can you point out some of the errors so I can 
avoid them?

Thanks,

Anze


On Friday 12 March 2010, Anze wrote:
 Hi!
 
 First of all: Willy Tarreau  others, thank you for a great piece of
  software! It just works. :)
 
 PHP has a bug when errors in PHP code do not always issue HTTP error status
 code to be sent, but instead return status 200
 (http://bugs.php.net/bug.php?id=50921)
 
 The problem is that when haproxy calls check.php, the scrpt itself is not
  run if there is an error in it (or in the includes it needs for the
  checks). So this check is not really reliable.
 
 I know this is not really haproxy's problem (the bug is on the PHP side),
  but reliability would be much higher if haproxy could compare the response
  body to some string (for instance, server must return really ok in
  request body). That way the bug in interpreter could never cause a false
  positive.
 
 Example PHP script:
 ?
   // this fails:
   call_some_undefined_function();
   // this is not executed:
   echo really ok
 ?
 
 This fails silently and return an empty document (or document with error
 description, depending on PHP config) with status 200.
 
 As a side note, that would also allow us to catch PHP configuration errors,
 such as short_tags set to off - in which case the previous code would be
 just listed (with status 200!), not executed. When compared to really ok,
  it would fail.
 
 Any chance this can be done with haproxy?
 
 Thanks,
 
 Anze
 


On Thursday 08 July 2010, Willy Tarreau wrote:
 On Wed, Jul 07, 2010 at 11:37:48AM +0200, Anze wrote:
  Hi all,
 
  We have tried upgrading our haproxy version to 1.4.8 and found than this
  configuration setting fails:
http-check expect string check_is_ok
  with message:
'http-check' only supports 'disable-on-404'
 
  I guess the patch never made it to stable (yet?). Are there any plans to
  add it to the trunk?
 
 Indeed it never made it in. Each time I look at it, I realize that
 cleaning it up and fixing it will take more time than what I have
 available to work on it :-/
 
 It's still available online though, but I would really not run it
 if I were you, as it still lacks some memory bounds checkings.
 
 Regards,
 Willy
 





Re: Truncated health check response from real servers

2010-07-07 Thread Anze
Hi all,

We have tried upgrading our haproxy version to 1.4.8 and found than this 
configuration setting fails:
  http-check expect string check_is_ok
with message:
  'http-check' only supports 'disable-on-404'

I guess the patch never made it to stable (yet?). Are there any plans to add 
it to the trunk?

Thanks,

Anze



On Wednesday 17 March 2010, Cyril Bonté wrote:
 Hi all,
 
 Le Mardi 16 Mars 2010 21:35:10, Willy Tarreau a écrit :
  I'm now gathering my changes and committing your patch with the small
  fixes above. That way we can concentrate on ECV.
 
 I've played a little with this commits.
 I added some traces to see what happens when a timeout occurs and noticed
  that the response is concatenated after the response of the previous timed
  out check, until the buffer is full. This can occur when the server
  started to send a response but was too long to finish or if a check uses
  HTTP/1.1 on a server with a HTTP Keep-Alive longer than the check timeout
  (adding Connection: close could help prevent this, maybe it should be
  added in the option httpchk documentation).
 
 The expect option from the ECV patch can be affected by this.
 
 The patch provided in attachment resets the buffer after a timeout bur I
  wonder if it can't be done in set_server_check_status(). Also, to save a
  little memory, the check_data buffer is only allocated for the servers
  that are checked.
 




Re: Truncated health check response from real servers

2010-03-17 Thread Cyril Bonté
Hi all,

Le Mardi 16 Mars 2010 21:35:10, Willy Tarreau a écrit :
 I'm now gathering my changes and committing your patch with the small
 fixes above. That way we can concentrate on ECV.

I've played a little with this commits.
I added some traces to see what happens when a timeout occurs and noticed that 
the response is concatenated after the response of the previous timed out 
check, until the buffer is full.
This can occur when the server started to send a response but was too long to 
finish or if a check uses HTTP/1.1 on a server with a HTTP Keep-Alive longer 
than the check timeout (adding Connection: close could help prevent this, 
maybe it should be added in the option httpchk documentation).

The expect option from the ECV patch can be affected by this.

The patch provided in attachment resets the buffer after a timeout bur I wonder 
if it can't be done in set_server_check_status().
Also, to save a little memory, the check_data buffer is only allocated for the 
servers that are checked.

-- 
Cyril Bonté
diff -Naur haproxy-ss-20100317/src/cfgparse.c haproxy-ss-20100317-checks/src/cfgparse.c
--- haproxy-ss-20100317/src/cfgparse.c	2010-03-16 22:57:27.0 +0100
+++ haproxy-ss-20100317-checks/src/cfgparse.c	2010-03-17 18:37:21.0 +0100
@@ -3116,12 +3116,6 @@
 			newsrv-curfd = -1;		/* no health-check in progress */
 			newsrv-health = newsrv-rise;	/* up, but will fall down at first failure */
 
-			/* Allocate buffer for partial check results... */
-			if ((newsrv-check_data = calloc(BUFSIZE, sizeof(char))) == NULL) {
-Alert(parsing [%s:%d] : out of memory while allocating check buffer.\n, file, linenum);
-err_code |= ERR_ALERT | ERR_ABORT;
-goto out;
-			}
 			cur_arg = 3;
 		} else {
 			newsrv = curproxy-defsrv;
@@ -3563,6 +3557,13 @@
 goto out;
 			}
 
+			/* Allocate buffer for partial check results... */
+			if ((newsrv-check_data = calloc(BUFSIZE, sizeof(char))) == NULL) {
+Alert(parsing [%s:%d] : out of memory while allocating check buffer.\n, file, linenum);
+err_code |= ERR_ALERT | ERR_ABORT;
+goto out;
+			}
+
 			newsrv-check_status = HCHK_STATUS_INI;
 			newsrv-state |= SRV_CHECKED;
 		}
diff -Naur haproxy-ss-20100317/src/checks.c haproxy-ss-20100317-checks/src/checks.c
--- haproxy-ss-20100317/src/checks.c	2010-03-16 22:57:27.0 +0100
+++ haproxy-ss-20100317-checks/src/checks.c	2010-03-17 18:43:12.0 +0100
@@ -1345,6 +1345,9 @@
 	else	/* HTTP, SMTP */
 		set_server_check_status(s, HCHK_STATUS_L7TOUT, NULL);
 }
+/* Reset the check buffer... */
+*s-check_data = '\0';
+s-check_data_len = 0;
 			}
 
 			//fprintf(stderr, process_chk: 10\n);
diff -Naur haproxy-ss-20100317/src/haproxy.c haproxy-ss-20100317-checks/src/haproxy.c
--- haproxy-ss-20100317/src/haproxy.c	2010-03-16 22:57:27.0 +0100
+++ haproxy-ss-20100317-checks/src/haproxy.c	2010-03-17 18:38:08.0 +0100
@@ -859,7 +859,9 @@
 
 			free(s-id);
 			free(s-cookie);
-			free(s-check_data);
+			if (s-check_data) {
+free(s-check_data);
+			}
 			free(s);
 			s = s_next;
 		}/* end while(s) */


Re: Truncated health check response from real servers

2010-03-17 Thread Cyril Bonté
Le Mercredi 17 Mars 2010 21:12:45, Willy Tarreau a écrit :
 (...) I don't
 agree with restting the buffer or even considering we have an error
 when a session does not close, because it is a regression. For instance,
 all people using HTTP/1.1 checks will see a problem here.

hey right, I suddenly realize I forgot to recheck that !

 Rather, I
 think we should start the parsing as soon as the response is large
 enough to be parsed, and only declare a parse error if we reach the
 end of response on a closed response. That way we can still parse
 status codes as soon as we get the first packet for most cases, but
 we can also wait for more data. It means that for some rare cases,
 we can parse the response multiple times waiting for it to be complete,
 but that's rather rare and right now those cases don't work at all.
 
 I'll see what can be done.

That looks much better, no need to wait for a complete response if only the 
status is checked :)
And the resource will be released faster in most of the cases.

-- 
Cyril Bonté



Re: Truncated health check response from real servers

2010-03-16 Thread nick
diff -ur haproxy-1.4.1/include/types/proxy.h haproxy-1.4.1-ecv-test/include/types/proxy.h
--- haproxy-1.4.1/include/types/proxy.h	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv-test/include/types/proxy.h	2010-03-15 10:15:40.0 +
@@ -137,6 +137,8 @@
 #define PR_O2_MYSQL_CHK 0x0002  /* use MYSQL check for server health */
 #define PR_O2_USE_PXHDR 0x0004  /* use Proxy-Connection for proxy requests */
 #define PR_O2_CHK_SNDST 0x0008  /* send the state of each server along with HTTP health checks */
+#define PR_O2_EXPECT	0x0010	/* http-check expect sth */
+#define PR_O2_NOEXPECT	0x0020	/* http-check expect ! sth */
 /* end of proxy-options2 */
 
 /* bits for sticking rules */
@@ -274,6 +276,9 @@
 	int grace;/* grace time after stop request */
 	char *check_req;			/* HTTP or SSL request to use for PR_O_HTTP_CHK|PR_O_SSL3_CHK */
 	int check_len;/* Length of the HTTP or SSL3 request */
+	char *expect_str;			/* http-check expected content */
+	regex_t *expect_regex;			/* http-check expected content */
+	char *expect_type;			/* type of http-check, such as status, string */
 	struct chunk errmsg[HTTP_ERR_SIZE];	/* default or customized error messages for known errors */
 	int uuid;/* universally unique proxy ID, used for SNMP */
 	unsigned int backlog;			/* force the frontend's listen backlog */
diff -ur haproxy-1.4.1/include/types/server.h haproxy-1.4.1-ecv-test/include/types/server.h
--- haproxy-1.4.1/include/types/server.h	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv-test/include/types/server.h	2010-03-15 10:15:40.0 +
@@ -147,6 +147,9 @@
 	struct freq_ctr sess_per_sec;		/* sessions per second on this server */
 	int puid;/* proxy-unique server ID, used for SNMP */
 
+	char *check_data;			/* storage of partial check results */
+	int check_data_len;			/* length of partial check results stored in check_data */
+
 	struct {
 		const char *file;		/* file where the section appears */
 		int line;			/* line where the section appears */
diff -ur haproxy-1.4.1/src/cfgparse.c haproxy-1.4.1-ecv-test/src/cfgparse.c
--- haproxy-1.4.1/src/cfgparse.c	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv-test/src/cfgparse.c	2010-03-15 10:15:40.0 +
@@ -2874,8 +2874,65 @@
 			/* enable emission of the apparent state of a server in HTTP checks */
 			curproxy-options2 |= PR_O2_CHK_SNDST;
 		}
+		else if (strcmp(args[1], expect) == 0) {
+			if (strcmp(args[2], status) == 0 || strcmp(args[2], string) == 0) {
+curproxy-options2 |= PR_O2_EXPECT;
+if (*(args[3]) == 0) {
+	Alert(parsing [%s:%d] : '%s %s %s' expects  regex as an argument.\n, 
+		file, linenum, args[0], args[1], args[2]);
+	return -1;
+}
+curproxy-expect_type = strdup(args[2]);
+curproxy-expect_str = strdup(args[3]);
+			}
+else if (strcmp(args[2], rstatus) == 0 || strcmp(args[2], rstring) == 0) {
+curproxy-options2 |= PR_O2_EXPECT;
+if (*(args[3]) == 0) {
+Alert(parsing [%s:%d] : '%s %s %s' expects  regex as an argument.\n,
+file, linenum, args[0], args[1], args[2]);
+return -1;
+}
+curproxy-expect_regex = calloc(1, sizeof(regex_t));
+if (regcomp(curproxy-expect_regex, args[3], REG_EXTENDED) != 0) {
+Alert(parsing [%s:%d] : bad regular expression '%s'.\n, file, linenum, args[0]);
+return -1;
+}
+curproxy-expect_type = strdup(args[2]);
+}
+else if (strcmp(args[2], !) == 0 ) {
+curproxy-options2 |= PR_O2_NOEXPECT;
+if (strcmp(args[3], status) == 0 || strcmp(args[3], string) == 0) {
+	if (*(args[4]) == 0) {
+	Alert(parsing [%s:%d] : '%s %s %s %s' expects  regex as an argument.\n,
+	file, linenum, args[0], args[1], args[2], args[3]);
+   	 	return -1;
+   		 }
+	curproxy-expect_type = strdup(args[3]);
+	curproxy-expect_str = strdup(args[4]);
+}
+else if (strcmp(args[3], rstatus) == 0 || strcmp(args[3], rstring) == 0) {
+	if (*(args[4]) == 0) {
+	Alert(parsing [%s:%d] : '%s %s %s %s' expects  regex as an argument.\n,
+	file, linenum, args[0], args[1], args[2], args[3]);
+   	 	return -1;
+   		 }
+
+

Re: Truncated health check response from real servers

2010-03-16 Thread Willy Tarreau
Hi Nick,

Thanks for the update. I've quickly reviewed it and noticed
some of the issues of the initial ECV patch (though I don't
remember them all, I'll have to dig into my mailbox). I'm
putting a few examples below.

What I can propose you is to proceed in 3 phases :

  - I will try to extract the two features from your patch
(response reassembly and ECV), and apply the first one
to next 1.4.

  - I'll try to fix the remaining issues of the ECV code and
post it for review.

  - then once everyone is OK and we get satisfying results, we
merge it into another 1.4, so that we'll finally get it into
1.4 stable.

There's also something we lose with the ECV patch : if ECV is in
use, then we can't enable the disable-on-404 feature anymore. I
think we could still combine them both by checking for the 404
when the response does not match the criterion. I'm not exactly
sure how but we'll see that later.

Some issues below :

On Tue, Mar 16, 2010 at 03:50:46PM +, n...@loadbalancer.org wrote:
 + else if (strcmp(args[1], expect) == 0) {
 + if (strcmp(args[2], status) == 0 || strcmp(args[2], 
 string) == 0) {
 + curproxy-options2 |= PR_O2_EXPECT;
 + if (*(args[3]) == 0) {
 + Alert(parsing [%s:%d] : '%s %s %s' 
 expects  regex as an argument.\n, 

warning, it's not a regex but a string in the error message.

 + char *contentptr = strstr(s-check_data, \r\n\r\n);

here we can go past the end of the buffer. = segfault if pattern missing.

 + /* Check the response content against the supplied string
 +  * or regex... */
 + if (strcmp(s-proxy-expect_type, string) == 0)

it's a waste of CPU cycles to compare strings for just a type which
fits in an enum.

Regards,
Willy




Re: Truncated health check response from real servers

2010-03-16 Thread Willy Tarreau
On Tue, Mar 16, 2010 at 06:22:09PM +0100, Willy Tarreau wrote:
 What I can propose you is to proceed in 3 phases :
 
   - I will try to extract the two features from your patch
 (response reassembly and ECV), and apply the first one
 to next 1.4.

OK, your code was clean and the two parts were really distinct.
It was a child's game to split them.

Doing so, I discovered one minor and one medium issues which I
fixed. The minor one is that detecting the end of a response
now always requires another poll() call, which can be expensive
on LBs with hundreds of servers, especially since in HTTP the
close is almost always there pending after the response :

  20:20:03.958207 recv(7, HTTP/1.1 200\r\nConnection: close\r\n..., 8030, 0) 
= 145
  20:20:03.958365 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
  20:20:03.958543 gettimeofday({1268767203, 958626}, NULL) = 0
  20:20:03.958694 recv(7, ..., 7885, 0) = 0
  20:20:03.958833 shutdown(7, 2 /* send and receive */) = 0

I've arranged the recv() call to be able to read up to EAGAIN or
error or end, and now we get both the response and the close in
the same call :

  20:29:58.797019 recv(7, HTTP/1.1 200\r\nConnection: close\r\n..., 8030, 0) 
= 145
  20:29:58.797182 recv(7, ..., 7885, 0) = 0
  20:29:58.797356 shutdown(7, 2 /* send and receive */) = 0

The medium issue was that by supporting multiple calls to recv(), we
woke up the bad old guy who sends us POLLERR status causing aborts
before recv() has a chance to read a full response. This happens on
the second recv() if the server has closed a bit quickly and sent an
RST :

  21:11:21.036600 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 993) = 1
  21:11:21.054361 gettimeofday({1268770281, 54467}, NULL) = 0
  21:11:21.054540 recv(7, H..., 8030, 0) = 1
  21:11:21.054694 recv(7, 0x967e759, 8029, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
  21:11:21.054843 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 
8, 975) = 1
  21:11:21.060274 gettimeofday({1268770281, 60386}, NULL) = 0
  21:11:21.060454 close(7)= 0

The fix simply consists in removing this old obsolete test, and now it's
OK :

  21:11:59.402207 recv(7, H..., 8030, 0) = 1
  21:11:59.402362 recv(7, 0x8b5c759, 8029, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
  21:11:59.402511 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 
8, 974) = 1
  21:11:59.407242 gettimeofday({1268770319, 407353}, NULL) = 0
  21:11:59.407425 recv(7, TTP/1.0 200 OK\r\n..., 8029, 0) = 16
  21:11:59.407606 recv(7, 0x8b5c769, 8013, 0) = -1 ECONNRESET (Connection reset 
by peer)
  21:11:59.407753 shutdown(7, 2 /* send and receive */) = -1 ENOTCONN 
(Transport endpoint is not connected)

The trained eye would have noticed that I got a two-packet HTTP response
with an RST detected before the second packet and that the check code
still managed to get it right. This is really good news !

I'm now gathering my changes and committing your patch with the small
fixes above. That way we can concentrate on ECV.

Cheers,
Willy




Re: Truncated health check response from real servers

2010-03-11 Thread Nick Chalk
Hello Willy.

On 8 March 2010 21:25, Willy Tarreau w...@1wt.eu wrote:
 On Mon, Mar 08, 2010 at 04:32:54PM +, Nick Chalk wrote:
 Attached is a patch against v1.4.1. It contains the updated ECV patch,
 and the hacks to work around check responses that span multiple
 packets.
 At first glance, it seems fine. I'll have to find my notes about the
 original patch to see if anything I spot back then is still present,
 but in my opinion it looks OK.

Thanks.

 I don't remember if there were any doc in the original patch either, I'll have
 to dig.

No, I don't think we have any documentation. I'll try to find some
time to add that to the patch.

Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/



Re: Truncated health check response from real servers

2010-03-08 Thread Nick Chalk
Hello Willy.

On 3 March 2010 20:31, Willy Tarreau w...@1wt.eu wrote:
 OK that's perfect then. If you don't manage to sort out your issue
 with small packets, do not hesitate to post your work in progress
 to the list, it often helps a lot to work iteratively.

The small-packet problem turned out to be a testing fault - the apache
servers I'm using for testing were not sending the whole page when I
had set a small MTU.

Attached is a patch against v1.4.1. It contains the updated ECV patch,
and the hacks to work around check responses that span multiple
packets.

It seems to be working fine in my tests, and handles check responses
that are bigger than the buffer.

I'd be grateful for anyone's comments and suggestions.

Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/
diff -ur haproxy-1.4.1/include/types/proxy.h haproxy-1.4.1-ecv/include/types/proxy.h
--- haproxy-1.4.1/include/types/proxy.h	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv/include/types/proxy.h	2010-03-08 10:38:22.0 +
@@ -137,6 +137,8 @@
 #define PR_O2_MYSQL_CHK 0x0002  /* use MYSQL check for server health */
 #define PR_O2_USE_PXHDR 0x0004  /* use Proxy-Connection for proxy requests */
 #define PR_O2_CHK_SNDST 0x0008  /* send the state of each server along with HTTP health checks */
+#define PR_O2_EXPECT	0x0010	/* http-check expect sth */
+#define PR_O2_NOEXPECT	0x0020	/* http-check expect ! sth */
 /* end of proxy-options2 */
 
 /* bits for sticking rules */
@@ -274,6 +276,9 @@
 	int grace;/* grace time after stop request */
 	char *check_req;			/* HTTP or SSL request to use for PR_O_HTTP_CHK|PR_O_SSL3_CHK */
 	int check_len;/* Length of the HTTP or SSL3 request */
+	char *expect_str;			/* http-check expected content */
+	regex_t *expect_regex;			/* http-check expected content */
+	char *expect_type;			/* type of http-check, such as status, string */
 	struct chunk errmsg[HTTP_ERR_SIZE];	/* default or customized error messages for known errors */
 	int uuid;/* universally unique proxy ID, used for SNMP */
 	unsigned int backlog;			/* force the frontend's listen backlog */
diff -ur haproxy-1.4.1/include/types/server.h haproxy-1.4.1-ecv/include/types/server.h
--- haproxy-1.4.1/include/types/server.h	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv/include/types/server.h	2010-03-08 10:38:22.0 +
@@ -147,6 +147,9 @@
 	struct freq_ctr sess_per_sec;		/* sessions per second on this server */
 	int puid;/* proxy-unique server ID, used for SNMP */
 
+	char *check_data;			/* storage of partial check results */
+	int check_data_len;			/* length of partial check results stored in check_data */
+
 	struct {
 		const char *file;		/* file where the section appears */
 		int line;			/* line where the section appears */
diff -ur haproxy-1.4.1/src/cfgparse.c haproxy-1.4.1-ecv/src/cfgparse.c
--- haproxy-1.4.1/src/cfgparse.c	2010-03-04 22:39:19.0 +
+++ haproxy-1.4.1-ecv/src/cfgparse.c	2010-03-08 10:38:22.0 +
@@ -2874,8 +2874,65 @@
 			/* enable emission of the apparent state of a server in HTTP checks */
 			curproxy-options2 |= PR_O2_CHK_SNDST;
 		}
+		else if (strcmp(args[1], expect) == 0) {
+			if (strcmp(args[2], status) == 0 || strcmp(args[2], string) == 0) {
+curproxy-options2 |= PR_O2_EXPECT;
+if (*(args[3]) == 0) {
+	Alert(parsing [%s:%d] : '%s %s %s' expects  regex as an argument.\n, 
+		file, linenum, args[0], args[1], args[2]);
+	return -1;
+}
+curproxy-expect_type = strdup(args[2]);
+curproxy-expect_str = strdup(args[3]);
+			}
+else if (strcmp(args[2], rstatus) == 0 || strcmp(args[2], rstring) == 0) {
+curproxy-options2 |= PR_O2_EXPECT;
+if (*(args[3]) == 0) {
+Alert(parsing [%s:%d] : '%s %s %s' expects  regex as an argument.\n,
+file, linenum, args[0], args[1], args[2]);
+return -1;
+}
+curproxy-expect_regex = calloc(1, sizeof(regex_t));
+if (regcomp(curproxy-expect_regex, args[3], REG_EXTENDED) != 0) {
+Alert(parsing [%s:%d] : bad regular expression '%s'.\n, file, linenum, args[0]);
+return -1;
+}
+curproxy-expect_type = strdup(args[2]);
+}
+else if (strcmp(args[2], !) == 0 ) {
+curproxy-options2 |= PR_O2_NOEXPECT;
+if (strcmp(args[3], status) == 0 || strcmp(args[3], string) == 0) {
+	if (*(args[4]) == 0) {
+	Alert(parsing [%s:%d] : '%s %s %s %s' expects  regex as an 

Re: Truncated health check response from real servers

2010-03-03 Thread Nick Chalk
Hello Willy.

On 2 March 2010 21:45, Willy Tarreau w...@1wt.eu wrote:
 If your quick ack already works for one single check, then simply
 allocate a buffer for each server in cfgparse.c, and have the
 checks functions use that server-specific buffer instead of
 trash.

Thanks for the pointer. I've added buffer and length variables to
struct server, and allocated space with the rest of the server
initialisation. Looks to be stable so far, although there's still a
problem with very small response packets - something I'll work on
tomorrow.

 If you come up with something working well, I'm willing to merge
 it into 1.4, so please do not hesitate to submit your work in
 progress. You'll also get more testers.

When I have something working and presentable, I'll submit the patch.

Thanks,
Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/



Re: Truncated health check response from real servers

2010-03-03 Thread Willy Tarreau
On Wed, Mar 03, 2010 at 08:12:09PM +, Nick Chalk wrote:
 Hello Willy.
 
 On 2 March 2010 21:45, Willy Tarreau w...@1wt.eu wrote:
  If your quick ack already works for one single check, then simply
  allocate a buffer for each server in cfgparse.c, and have the
  checks functions use that server-specific buffer instead of
  trash.
 
 Thanks for the pointer. I've added buffer and length variables to
 struct server, and allocated space with the rest of the server
 initialisation. Looks to be stable so far, although there's still a
 problem with very small response packets - something I'll work on
 tomorrow.
 
  If you come up with something working well, I'm willing to merge
  it into 1.4, so please do not hesitate to submit your work in
  progress. You'll also get more testers.
 
 When I have something working and presentable, I'll submit the patch.

OK that's perfect then. If you don't manage to sort out your issue
with small packets, do not hesitate to post your work in progress
to the list, it often helps a lot to work iteratively.

Cheers,
Willy




Re: Truncated health check response from real servers

2010-03-02 Thread Nick Chalk
Hello Willy.

On 25 February 2010 20:25, Willy Tarreau w...@1wt.eu wrote:
 On Mon, Feb 15, 2010 at 10:05:57AM +, Nick Chalk wrote:
 On 13 February 2010 10:40, Willy Tarreau w...@1wt.eu wrote:
  Indeed, with MSG_PEEK we have no way to tell the connection was closed.

 For the time being, I've hacked together a patch to get our customer
 up and running.

 I've allocated a new static character buffer, to store the
 intermediate results from the real server. I'm relying on recv()
 returning a length of 0 to indicate the server has closed the
 connection - not sure if that's a reliable method, but it seems to be
 repeatable.

 It will not always work with MSG_PEEK.

Sorry, I should have made it clear that I'd removed the MSG_PEEK flag.

 The other temporary solution will be to allocate a receive buffer to
 every server to store health check responses.

Hmm, I hadn't considered that multiple checks could be running in
parallel. I'll have a look at whether that can break my quick hack.

Thanks,
Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/



Re: Truncated health check response from real servers

2010-03-02 Thread Willy Tarreau
Hi Nick,

On Tue, Mar 02, 2010 at 12:25:06PM +, Nick Chalk wrote:
 Hello Willy.
 
 On 25 February 2010 20:25, Willy Tarreau w...@1wt.eu wrote:
  On Mon, Feb 15, 2010 at 10:05:57AM +, Nick Chalk wrote:
  On 13 February 2010 10:40, Willy Tarreau w...@1wt.eu wrote:
   Indeed, with MSG_PEEK we have no way to tell the connection was closed.
 
  For the time being, I've hacked together a patch to get our customer
  up and running.
 
  I've allocated a new static character buffer, to store the
  intermediate results from the real server. I'm relying on recv()
  returning a length of 0 to indicate the server has closed the
  connection - not sure if that's a reliable method, but it seems to be
  repeatable.
 
  It will not always work with MSG_PEEK.
 
 Sorry, I should have made it clear that I'd removed the MSG_PEEK flag.
 
  The other temporary solution will be to allocate a receive buffer to
  every server to store health check responses.
 
 Hmm, I hadn't considered that multiple checks could be running in
 parallel. I'll have a look at whether that can break my quick hack.

If your quick ack already works for one single check, then simply
allocate a buffer for each server in cfgparse.c, and have the
checks functions use that server-specific buffer instead of
trash. You can even allocate smaller buffers than trash, because
we don't necessarily need to receive and process large check
responses.

If you come up with something working well, I'm willing to merge
it into 1.4, so please do not hesitate to submit your work in
progress. You'll also get more testers.

Thanks,
Willy




Re: Truncated health check response from real servers

2010-02-25 Thread Willy Tarreau
Hi Nick,

On Mon, Feb 15, 2010 at 10:05:57AM +, Nick Chalk wrote:
 Hello Willy, Krzysztof.
 
 On 13 February 2010 10:40, Willy Tarreau w...@1wt.eu wrote:
  On Fri, Feb 12, 2010 at 05:47:41PM +0100, Krzysztof Ol??dzki wrote:
  There are several issues with the fix:
 
   - we need to check if connection is not closed, as it is pointless to
  use MSG_PEEK and restarting such check if there is no more data we are
  able to read
 
  Indeed, with MSG_PEEK we have no way to tell the connection was closed.
 
 For the time being, I've hacked together a patch to get our customer
 up and running.
 
 I've allocated a new static character buffer, to store the
 intermediate results from the real server. I'm relying on recv()
 returning a length of 0 to indicate the server has closed the
 connection - not sure if that's a reliable method, but it seems to be
 repeatable.

It will not always work with MSG_PEEK. It will only return that if
the server closes without sending any data. However if it sends some
data first, every subsequent recv(MSG_PEEK) will return those data,
so you'll never detect the close. That's what Krzysztof explained
when he said that we'd rely on timeouts even when servers return
short responses.

Maybe we could try to use some non-portable methods to try to detect
a close (eg: getsockopt(fd, SOL_SOCKET, SO_RCVBUF) to see if the input
buffer is closed, etc...). But I don't even think it would work, and
even if it did it would be terribly tricky.

The other temporary solution will be to allocate a receive buffer to
every server to store health check responses. This might not be that
hard in fact, and maybe we can even do that during 1.4 maintenance as
it would be a nice improvement with very little risk of regression.

Regards,
Willy




Re: Truncated health check response from real servers

2010-02-15 Thread Nick Chalk
Hello Willy, Krzysztof.

On 13 February 2010 10:40, Willy Tarreau w...@1wt.eu wrote:
 On Fri, Feb 12, 2010 at 05:47:41PM +0100, Krzysztof Olędzki wrote:
 There are several issues with the fix:

  - we need to check if connection is not closed, as it is pointless to
 use MSG_PEEK and restarting such check if there is no more data we are
 able to read

 Indeed, with MSG_PEEK we have no way to tell the connection was closed.

For the time being, I've hacked together a patch to get our customer
up and running.

I've allocated a new static character buffer, to store the
intermediate results from the real server. I'm relying on recv()
returning a length of 0 to indicate the server has closed the
connection - not sure if that's a reliable method, but it seems to be
repeatable.

  - some servers return empty description so increasing minimum response
 length prevents haproxy from accepting such checks. Of course, if you
 are not using such server, it should be safe to do it in your locally
 patched version, but we mustn't do it on a public version.

 In fact, we should re-parse the response each time we call recv(). As
 long as we don't find a complete response, we can wait. This still
 implies a non-trivial change to current code.

I decided not to run the content check for every received packet, as I
couldn't see an easy way to deal with the case where the string to
match is split between two packets.

Nick.

-- 
Nick Chalk.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/



Re: Truncated health check response from real servers

2010-02-13 Thread Willy Tarreau
On Fri, Feb 12, 2010 at 05:47:41PM +0100, Krzysztof Ol??dzki wrote:
 There are several issues with the fix:
 
  - we need to check if connection is not closed, as it is pointless to 
 use MSG_PEEK and restarting such check if there is no more data we are 
 able to read

Indeed, with MSG_PEEK we have no way to tell the connection was closed.

  - some servers return empty description so increasing minimum response 
 length prevents haproxy from accepting such checks. Of course, if you 
 are not using such server, it should be safe to do it in your locally 
 patched version, but we mustn't do it on a public version.

In fact, we should re-parse the response each time we call recv(). As
long as we don't find a complete response, we can wait. This still
implies a non-trivial change to current code.

  - it may interfere with other, non http checks.
 
 Long term we should implement the Willy's idea and merge http session 
 processing and http checks, but for now I'll try to fix it in 1.4 with 
 respect to mentioned above problems.

This part is even more important with ECV checks BTW.

Regards,
Willy




Re: Truncated health check response from real servers

2010-02-10 Thread Willy Tarreau
On Wed, Feb 10, 2010 at 10:56:14PM +, Nick Chalk wrote:
 Thanks Willy, Krzysztof.
 
 2010/2/10 Willy Tarreau w...@1wt.eu:
  Did you managed to fix the several remaining issues which could cause
  it to crash the process ?
 
 I believe so, following Cyril Bonté's suggestions last week. I'm still
 testing it, though.

OK, we talked with Cyril about all the issues in this patch, so it's
possible that you both finally got it right !

 2010/2/10 Krzysztof Ol??dzki o...@ans.pl:
  On 2010-02-10 23:02, Willy Tarreau wrote:
  Krzysztof Oledsky proposed a patch to make use of recv(MSG_PEEK)
  in order to leave incomplete data in kernel buffers instead of consuming
  it. I don't think it could have any side effect, you may want to try it.
  It was about 3-4 weeks ago on the list.
 
  Yep, it is worth to try it, but it is still a dirty fix. I have idea how to
  make it right, but haven't been able to find time to do it, yet.
 
 Thanks - I'll try that tomorrow, and report my findings.

Thanks!

 What was confusing me was that the test would succeed a fair
 proportion of the time!

That's typical of such packet-based parsing. In general, a good way
to check if this is what you're seeing is to slow down the machine
you run haproxy on (run a few while :; do :; done in parallel)
then see if it starts working again, due to the fact that when haproxy
gets the CPU, most of the time all packets have reached the kernel
buffers.

Regards,
Willy