yknoya opened a new pull request, #12729:
URL: https://github.com/apache/trafficserver/pull/12729

   # Problem
   https://github.com/apache/trafficserver/pull/9181 introduced an issue where 
an origin server was marked as down even though a connection had been 
successfully established.
   
   This issue occurs under the following conditions:
   1. `proxy.config.http.server_session_sharing.match` is set to a value other 
than `none` (i.e., server session reuse is enabled).
   2. A server session is reused when connecting to the origin.
   3. The connection is closed after sending a request to the origin.
   4. Condition 3 occurs repeatedly until it reaches the threshold defined by 
`proxy.config.http.connect_attempts_rr_retries`.
   
   The issue has been confirmed in the following branches/versions (other 
versions not tested):
   - master (90dbc21)
   - 10.1.0
   - 9.2.11
   
   # Cause
   When ATS begins processing an origin connection, it executes 
`t_state.set_connect_fail(EIO)` to tentatively set `connect_result` to `EIO`:
   
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L8054
   
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/include/proxy/http/HttpTransact.h#L932
   
   If server session reuse is not possible, `connect_result` is cleared once 
the connection is established:
   
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L1860
   
   However, when a server session is reused, `connect_result` is not cleared 
and remains set to `EIO`.
   This regression was triggered by the change introduced in  
https://github.com/apache/trafficserver/pull/9181 .
   
   Before the PR was merged, `t_state.set_connect_fail(EIO)` was not executed 
when a server session was reused.
   After the PR, it is executed regardless of whether a server session is 
reused or not.
   
   With `connect_result` incorrectly left as `EIO`, if the connection is closed 
after sending a request to the origin, the following call chain leads to 
execution of `HttpSM::mark_host_failure`, causing the `fail_count` to be 
incremented:
   1. 
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpTransact.cc#L3466
   2. 
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpTransact.cc#L3786
   3. 
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpTransact.cc#L3884
   4. 
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L4630
   5. 
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L5876
   
   If this happens repeatedly and reaches the threshold defined by 
`proxy.config.http.connect_attempts_rr_retries`, the origin server is 
incorrectly marked as down:
   
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L5876-L5885
   
   Since the connection to the origin is actually successful, marking it as 
down is incorrect.
   
   # Fix
   Update the logic so that `t_state.set_connect_fail(EIO)` is executed only 
when establishing a new connection to the origin (i.e., when a server session 
is not reused), and ensure that `connect_result` is cleared once the connection 
succeeds.
   
   Additionally, when `multiplexed_origin` is true, `connect_result` was also 
not being cleared after a successful connection.
   In this case, although `t_state.set_connect_fail(EIO)` is executed (see 
below), the lack of a corresponding clear operation results in `connect_result` 
remaining `EIO`:
   
https://github.com/apache/trafficserver/blob/90dbc21a541986db6b223649cb4f798e190a550f/src/proxy/http/HttpSM.cc#L5706-L5723
   
   This patch ensures that `connect_result` is cleared whenever the connection 
succeeds, regardless of whether `multiplexed_origin` is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to