GUI opened a new issue, #10393: URL: https://github.com/apache/trafficserver/issues/10393
When upgrading from 9.1.4 to 9.2.x, I've observed some issues with requests failing in unexpected ways that they didn't previously in 9.1.4. I'm not really certain what's happening, so it's a bit difficult to give a summary, but it seems like perhaps Traffic Server 9.2.x is closing connections to clients in front of Traffic Server unexpectedly too early in certain cases. My only theory is that it's somehow related to larger request bodies (and maybe specific to PUT requests), and maybe only when these larger request bodies are still being streamed after the origin server generates an (expected) error? But again, not really sure. Here's a more detailed example of how this is pretty reproducible in all versions of 9.2.0-9.2.2, and also demonstrates how this didn't happen in 9.1.4. The basic reproducible case I've narrowed this down to looks like this: ``` [nginx proxy] => [trafficserver] => [nginx server] ``` 1. The `nginx proxy` layer does *not* have a maximum request body size. 2. The underlying `nginx server` component *is* setup with a maximum request body size. If a client sends a request body that exceeds this size, then nginx returns a `413 Request Entity Too Large` error. The basic issue I'm seeing is that if a client exceeds this request body size at the `nginx server` origin layer then Traffic Server 9.2+ seems to behave in unexpected ways: 1. **In Traffic Server 9.1.4:** The `nginx proxy` layer (and client making the request) reliably receives the `413 Request Entity Too Large` error that the origin `nginx server` layer generates and is proxied via TrafficServer. 2. **In Traffic Server 9.2.2:** The `nginx proxy` layer maybe 50% of the time receives the expected 413 error (from the `nginx server` origin), but then the other 50% of the time the `nginx proxy` ends up reporting a `502 Bad Gateway` error which is generated by nginx due to an apparent communication errors with `trafficserver`. This seems to indicate that the connection from `nginx proxy` to `trafficserver` is being closed unexpectedly by Traffic Server too early before the `413` error can be proxied back successfully. Here is a repo that contains a minimal reproduction of this along more detailed steps: https://github.com/GUI/trafficserver-debugging This issue appears to be present using all default Traffic Server configuration, so there's no custom Traffic Server configuration other than proxying to the underlying server. See the repo's README for exact steps to reproduce an more examples of the expected output in Traffic Server 9.1.4 versus the new behavior that's more erratic in Traffic Server 9.2.2. The short version is that Traffic Server 9.1.4 will always return the expected `413 Request Entity Too Large` that is proxied from the underlying origin server, but when Traffic Server 9.2.x is in the middle, then it will randomly lead to nginx's connections to Traffic Server failing and the `nginx proxy` layer generates `502 Bad Gateway` errors. A few notes I've observed: - It happens more readily if the request body size is bigger (eg, more than a couple MBs). - Strangely, I can reproduce it reliably for PUT requests with a body, but not POST requests. - In tcpdumps, I've observed TCP RSTs under TrafficServer 9.2.x during these situations where there don't appear to be any RSTs in 9.1.x. - I've been able to reproduce this in both 9.2.0 and 9.2.2, so it seems like it's related to some change between 9.1.4 to 9.2.0. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@trafficserver.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org