Hello,

On Thu, Oct 30, 2014 at 08:55:15PM +0100, Evert wrote:
> I am running haproxy in front of a asp.net application. The application has
> been running happily for a while!
> 
> Recently I have been running into very hard to trace errors. I could use
> some guidance to further analyse the situation.
> 
> What happens is that one of my clients can predictably generate an error in
> my application where the server is unable to process a POST result due to
> some unknown error in the POST data. The error is an asp.net index out of
> range error in a generic part of a binding operation. My code is then not
> yet in control of the situation. In other words, nothing I can do about it
> in my source code.
> 
> Unfortunately I am unable to reproduce the situation from my office. 
> 
> I went to visit the client yesterday and found he has a very complex and
> slow network infrastructure. The issue is not due to old browsers and the
> issue can be reproduced by using internet explorer or Chrome.

Slow networks sometimes cause "funny" things to happen or trigger very
deeply burried bugs. They're the hardest ones to reproduce, sometimes we
can never reproduce them outside of their environment because they trigger
very tiny races that require a very precise timing to appear. Note that
among the tricks you can sometimes use to try to reproduce some of the
issues faced on slow network is to plug your PC behind a 10 Mbps hub (if
you can find one), and sometimes playing with your MTU can cause the same
strange patterns to appear because you'll upload significantly slower with
very low MTUs (eg: 300). But you're very lucky when that works...

> I analysed the haproxy log files and see the following actions that run
> without errors, this is from my office with high speed internet access:
> 
> [30/Oct/2014:20:32:39 +0100] bk_ws06/iis05 8309/0/1/350/8661 200 79576 - -
> ---- 23/23/0/0/0 0/0 {www.customera.nl|} POST
> /clientb/nl-nl/bestellingen/vooriemandanders.aspx HTTP/1.1
> 
> [30/Oct/2014:20:32:48 +0100] bk_ws06/iis05 1562/0/1/13548/19205 200 3455532
> - - ---- 31/31/0/1/0 0/0 {www.customera.nl|} POST
> /clientb/nl-nl/bestellingen/vooriemandanders.aspx HTTP/1.1

I'm seeing that you stripped the source ip:port information to avoid public
disclosure, which is fine, but could you at least verify that the source
port was the same in both cases ? I guess so given the long delay before
the request, which indicates either a slow network or a pause between two
requests over the same connection.
 
> But the client experiences the following with a slow connection:
> 
> [30/Oct/2014:11:51:39 +0100] bk_ws06/iis05 6916/0/0/633/7551 200 74996 - -
> ---- 39/39/0/0/0 0/0 {www.customera.nl|} POST
> /clientb/nl-nl/bestellingen/vooriemandanders.aspx HTTP/1.1
> 
> [30/Oct/2014:11:51:47 +0100] bk_ws06/iis05 4421/0/0/7226/45463 200 3944897 -
> - ---- 33/33/0/0/0 0/0 {www.customera.nl|} POST
> /clientb/nl-nl/bestellingen/vooriemandanders.aspx HTTP/1.1

So same here, the source port information will be important.

> [30/Oct/2014:11:52:32 +0100] bk_ws06/iis05 141644/0/1/17201/160926 302 635 -
> - ---- 52/52/0/0/0 0/0 {www.customera.nl|} POST
> /clientb/nl-nl/bestellingen/vooriemandanders.aspx HTTP/1.1
> 
> [30/Oct/2014:11:55:13 +0100] bk_ws06/iis05 0/0/0/93/93 200 28992 - - ----
> 52/52/0/1/0 0/0 {www.customera.nl|} GET
> /Default.aspx?tabid=14471&error=An%20unexpected%20error%20has%20occurred&con
> tent=0 HTTP/1.1

OK so the server has returned a 302 with a location pointing to this
error message that the client can display. Now the question is... why
did the server find an error here :-/

> What really surprises me is the third line showing a 302 http code which I
> do not expect. 

It's the error above, don't worry for this one. I'm seeing that it
took 17 seconds to upload the whole contents. Do you know if the server
is normally capable of waiting for the data that long ? In your example
it took 13 seconds, so probably there's a mix of network time and server
time there.

> In my haproxy config I set the following "defaults":
> 
>         timeout connect 5s
>         timeout client 240s
>         timeout server 240s
>         option http-server-close
>         retries 3

Nothing special here. What version is this ? Please use "haproxy -vv"
to report every build option as well. What else do you have in the
frontend and backend exhibiting this behaviour ? I'm thinking about
header rewrites, cookies, "option http-send-name-header", "balance uri"
or "balance url_param", etc.

> I tried to use fiddler at my office to simulate a slow connections but I
> cannot reproduce the situation.
> 
> What can I do?

Depending on the version, we'll see. If it matches a known buggy version,
the best way will be to upgrade. But if there's no known bug supposed to
be in relation with that, the next step will be to take network traces :-/
I think you don't necessarily need to go back to the customer's to take
them, taking them both on the haproxy machine (one in front, one behind)
should be enough to understand what is happening. It's possible that the
client sends invalid data, just like it's possible that haproxy does
something unexpected.

So please first start with the elements above so that we can decide if
you need to provide more information or not. In anyway we won't ask you
to send any confidential information to the list.

Regards,
Willy


Reply via email to