shinrich opened a new pull request #7291:
URL: https://github.com/apache/trafficserver/pull/7291


   This is one possible solution to the immediate dead/down server on post 
failure problem described in issue #7222 
   
   This PR uses the previously unused fail_count field in the set of hostdb 
structures to track how many times in a row an address for a name has failed.  
In the case where transaction is not retryable (e.g. in the case of a post 
failure where data has been sent to the server), this code change updates the 
app.http_data.fail_count and updates the app.http_data.last_failure field only 
if the fail_count is greater than or equal to the value of the 
proxy.config.http.connect_attempts_rr_retries.  So if that was set to 3.  Three 
consecutive failures would need to occur before that address for the server 
name would be marked down.
   
   I tested this in an environment with microdns set up to reply with addresses 
192.168.1.10 and 192.168.1.13 for the name "foo".
   
   I had servers listening on port 8888 on 192.168.1.10 and 192.168.1.13, but 
the version running on 192.168.1.13 would wait for 40 seconds before replying 
to a POST.  The other server responded immediately.  The 
transaction_no_activity_timeout_out was set to 10 seconds.
   
   Without this code change, the POST request would hit 192.168.1.13 first and 
timeout.  Then subsequent requests would be sent to 192.168.1.10 until the 
interval specified in proxy.config.http.down_server.cache_time had passed.
   
   With this code change, the POST requests would be sent to 192.168.1.13 for 
the number of times specified in proxy.config.http.connect_attempts_rr_retries 
before marking that address down for "foo" and moving onto 192.168.1.10 
exclusively until the time specified in 
proxy.config.http.down_server.cache_time had passed.
   
   This is not going to perfectly ensure that the bad address is tried exactly 
proxy.config.http.connect_attempts_rr_retries times.  Since the HostDBInfo is 
copied into the HttpSM area and then updates are copied back to the main HostDB 
store, it is quite possible that concurrent threads will cancel out each others 
updates.  But this should get reasonably close to matching the retry count.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to