On Tue, Mar 16, 2010 at 06:22:09PM +0100, Willy Tarreau wrote:
> What I can propose you is to proceed in 3 phases :
> 
>   - I will try to extract the two features from your patch
>     (response reassembly and ECV), and apply the first one
>     to next 1.4.

OK, your code was clean and the two parts were really distinct.
It was a child's game to split them.

Doing so, I discovered one minor and one medium issues which I
fixed. The minor one is that detecting the end of a response
now always requires another poll() call, which can be expensive
on LBs with hundreds of servers, especially since in HTTP the
close is almost always there pending after the response :

  20:20:03.958207 recv(7, "HTTP/1.1 200\r\nConnection: close\r\n"..., 8030, 0) 
= 145
  20:20:03.958365 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
  20:20:03.958543 gettimeofday({1268767203, 958626}, NULL) = 0
  20:20:03.958694 recv(7, ""..., 7885, 0) = 0
  20:20:03.958833 shutdown(7, 2 /* send and receive */) = 0

I've arranged the recv() call to be able to read up to EAGAIN or
error or end, and now we get both the response and the close in
the same call :

  20:29:58.797019 recv(7, "HTTP/1.1 200\r\nConnection: close\r\n"..., 8030, 0) 
= 145
  20:29:58.797182 recv(7, ""..., 7885, 0) = 0
  20:29:58.797356 shutdown(7, 2 /* send and receive */) = 0

The medium issue was that by supporting multiple calls to recv(), we
woke up the bad old guy who sends us POLLERR status causing aborts
before recv() has a chance to read a full response. This happens on
the second recv() if the server has closed a bit quickly and sent an
RST :

  21:11:21.036600 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 993) = 1
  21:11:21.054361 gettimeofday({1268770281, 54467}, NULL) = 0
  21:11:21.054540 recv(7, "H"..., 8030, 0) = 1
  21:11:21.054694 recv(7, 0x967e759, 8029, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
  21:11:21.054843 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 
8, 975) = 1
  21:11:21.060274 gettimeofday({1268770281, 60386}, NULL) = 0
  21:11:21.060454 close(7)                = 0

The fix simply consists in removing this old obsolete test, and now it's
OK :

  21:11:59.402207 recv(7, "H"..., 8030, 0) = 1
  21:11:59.402362 recv(7, 0x8b5c759, 8029, 0) = -1 EAGAIN (Resource temporarily 
unavailable)
  21:11:59.402511 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=7, u64=7}}}, 
8, 974) = 1
  21:11:59.407242 gettimeofday({1268770319, 407353}, NULL) = 0
  21:11:59.407425 recv(7, "TTP/1.0 200 OK\r\n"..., 8029, 0) = 16
  21:11:59.407606 recv(7, 0x8b5c769, 8013, 0) = -1 ECONNRESET (Connection reset 
by peer)
  21:11:59.407753 shutdown(7, 2 /* send and receive */) = -1 ENOTCONN 
(Transport endpoint is not connected)

The trained eye would have noticed that I got a two-packet HTTP response
with an RST detected before the second packet and that the check code
still managed to get it right. This is really good news !

I'm now gathering my changes and committing your patch with the small
fixes above. That way we can concentrate on ECV.

Cheers,
Willy


Reply via email to