On Tue, Jun 30, 2015 at 01:17:03PM +0100, Charlie Smurthwaite wrote:
> On 30/06/15 12:50, Charlie Smurthwaite wrote:
> >
> >Hi Willy,
> >
> >Please find attached a tgz containing the following:
> >
> >* Output from "strace -o log -tts 16384 -p $(haproxy_pid)"
> >* Full packet capture of all port-80 traffic (client is on 10.0.2.82)
> >* Log output from client identifying corrupt file
> >* Modified config file used in this test
> >
> >For this test, only IPv4 was used. Client side MSS is forced to 1300 
> >by an intermediate firewall.
> >
> >Please let me know if any further information would be useful.
> >
> >Charlie
> Looking at the strace, the following jumps out as being odd:
> 
> A partial write succeeds, writing the remaining data fails with EAGAIN. 
> The send is then retried with what looks like a corrupt buffer.

Excellent, you were faster than me, I've just found finder_test.rb in the
trace :-)

(...)

> 12:37:57.056999 sendto(3, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
(...)
> 0\0\0\0\0\0\0\0\0\0\0\0", 
> 2967, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 2967

Huh?

We find it here, pretty easily this time thanks to the long series of
zeroes :

13:37:57.056969 IP 185.22.208.18.80 > 10.0.2.82.55462: Flags [P.], seq 
46979:48267, ack 18299, win 652, options [nop,nop,TS val 28524
        0x0000:  4500 053c 2cb1 4000 4006 7390 b916 d012  E..<,.@[email protected].....
        0x0010:  0a00 0252 0050 d8a6 2420 bb1b 99da d9b9  ...R.P..$.......
        0x0020:  8018 028c 8d2a 0000 0101 080a aa05 3033  .....*........03
        0x0030:  35cd 5460 4854 5450 2f31 2e31 2032 3030  5.T`HTTP/1.1.200
        0x0040:  204f 4b0d 0a53 6572 7665 723a 206e 6769  .OK..Server:.ngi
        0x0050:  6e78 2f31 2e38 2e30 0d0a 4461 7465 3a20  nx/1.8.0..Date:.
        0x0060:  5475 652c 2033 3020 4a75 6e20 3230 3135  Tue,.30.Jun.2015
        0x0070:  2031 313a 3337 3a35 3720 474d 540d 0a43  .11:37:57.GMT..C
        0x0080:  6f6e 7465 6e74 2d54 7970 653a 2074 6578  ontent-Type:.tex
        0x0090:  742f 706c 6169 6e0d 0a54 7261 6e73 6665  t/plain..Transfe
        0x00a0:  722d 456e 636f 6469 6e67 3a20 6368 756e  r-Encoding:.chun
        0x00b0:  6b65 640d 0a53 7461 7475 733a 2032 3030  ked..Status:.200
        0x00c0:  204f 4b0d 0a6c 6173 742d 6d6f 6469 6669  .OK..last-modifi
        0x00d0:  6564 3a20 5765 642c 2030 3320 4a75 6e20  ed:.Wed,.03.Jun.
        0x00e0:  3230 3135 2030 373a 3034 3a33 3320 474d  2015.07:04:33.GM
        0x00f0:  540d 0a65 7461 673a 2022 312f 2f61 6374  T..etag:."1//act
        0x0100:  6976 6572 6573 6f75 7263 652f 7465 7374  iveresource/test
        0x0110:  2f63 6173 6573 2f66 696e 6465 725f 7465  /cases/finder_te
        0x0120:  7374 2e72 6222 0d0a 6163 6365 7074 2d72  st.rb"..accept-r
        0x0130:  616e 6765 733a 2062 7974 6573 0d0a 4361  anges:.bytes..Ca
        0x0140:  6368 652d 436f 6e74 726f 6c3a 2070 7269  che-Control:.pri
        0x0150:  7661 7465 2c20 6d61 782d 6167 653d 3630  vate,.max-age=60
        0x0160:  3438 3030 0d0a 582d 5261 636b 2d43 6163  4800..X-Rack-Cac
        0x0170:  6865 3a20 6d69 7373 0d0a 0d0a 6634 620d  he:.miss....f4b.
        0x0180:  0a72 6571 7569 7265 2027 6162 7374 7261  .require.'abstra
        0x0190:  6374 5f75 6e69 7427 0a72 6571 7569 7265  ct_unit'.require
        0x01a0:  2022 6669 7874 7572 6573 2f70 6572 736f  ."fixtures/perso
        0x01b0:  6e22 0a72 6571 7569 7265 2022 6669 7874  n".require."fixt
        0x01c0:  7572 6573 2f63 7573 746f 6d65 7222 0a72  ures/customer".r
        0x01d0:  6571 7569 7265 2022 6669 7874 7572 6573  equire."fixtures
        0x01e0:  2f73 7472 6565 745f 6164 6472 6573 7322  /street_address"
(...)
        0x04f0:  2061 6c6c 203d 2050 6572 736f 6e2e 616c  .all.=.Person.al
        0x0500:  6c0a 2020 2020 6173 7365 7274 5f65 7175  l.....assert_equ
        0x0510:  616c 2032 2c20 616c 6c2e 7369 7a65 0a20  al.2,.all.size..
        0x0520:  2020 2061 7373 6572 745f 6b69 6e64 5f6f  ...assert_kind_o
        0x0530:  6620 5065 7273 6f6e 2c20 616c            f.Person,.al
13:37:57.057052 IP 185.22.208.18.80 > 10.0.2.82.55462: Flags [.], seq 
48267:49555, ack 18299, win 652, options [nop,nop,TS val 285246
        0x0000:  4500 053c 2cb2 4000 4006 738f b916 d012  E..<,.@[email protected].....
        0x0010:  0a00 0252 0050 d8a6 2420 c023 99da d9b9  ...R.P..$..#....
        0x0020:  8010 028c 4479 0000 0101 080a aa05 3033  ....Dy........03
        0x0030:  35cd 5460 0000 0000 0000 0000 0000 0000  5.T`............
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0070:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0080:  0000 0000 0000 0000 0000 0000 0000 0000  ................

Looking at the send() code, it's pretty clear that there's no way the send
pointer can change there upon EAGAIN. Thus I'm heading towards a different
possibility : since we sent another request from the same buffer between
the EAGAIN and the send(0), I'm suspecting that the issue happens during the
realign of the response buffer. It's possible that pending data are improperly
dealt with and result in buf->ptr + buf->o to point to a wrong location in the
buffer, in your case probably the end of the buffer full of zeroes since it
has not been filled yet.

I'm now trying to figure what's happening.

Thanks a lot for your traces, they're extremely helpful!

Willy


Reply via email to