bryancall opened a new issue, #12971: URL: https://github.com/apache/trafficserver/issues/12971
## Description ATS crashes with a failed assertion when a plugin uses `TSHttpTxnServerAddrSet()` to set a server address, the connection fails, and ATS retries with caching enabled. ``` Fatal: src/proxy/http/HttpTransact.cc:3503: failed assertion `s->redirect_info.redirect_in_process` ``` The crash occurs in `set_cache_prepare_write_action_for_new_request()` which is called from `HandleCacheOpenReadMiss()`. ## Root Cause The `TSHttpTxnServerAddrSet` retry logic added in #12733 (for issue #12611) does not account for the cache write lock state. The flow is: 1. Cache MISS → `OSDNSLookup` routes to `HandleCacheOpenReadMiss` 2. First pass: `set_cache_prepare_write_action_for_new_request` sets `action = PREPARE_TO_WRITE` 3. SM acquires the cache write lock → `write_lock_state = SUCCESS`, `action = WRITE` 4. Origin connection **fails** 5. Retry logic fires (line 3936): detects `os_addr_style == USE_API`, resets DNS state to `TRY_DEFAULT`, calls `CallOSDNSLookup()` 6. DNS resolves → `OSDNSLookup` → `cache_lookup_result == MISS` → routes back to `HandleCacheOpenReadMiss` 7. Second pass: `write_lock_state` is already `SUCCESS` → `set_cache_prepare_write_action_for_new_request` hits the assertion at line 3503 because `redirect_in_process == false` The assertion only expects `write_lock_state == SUCCESS` during redirects, but the connection retry is a second legitimate scenario where the write lock is already held. ## Why Tests Didn't Catch This The autest added in #12733 uses `enable_cache=False`. With caching disabled, `OSDNSLookup` routes to `LookupSkipOpenServer` instead of `HandleCacheOpenReadMiss`, so the cache write lock path is never exercised. ## Fix The assertion in `set_cache_prepare_write_action_for_new_request` needs to be relaxed to also accept the retry case. When `api_server_addr_set_retried` is true, the cache key is the same (same URL, different server address), so reusing the write lock is safe: ```cpp ink_release_assert(s->redirect_info.redirect_in_process || s->api_server_addr_set_retried); ``` The existing autest should also add a test run with caching enabled. ## Stack Trace ``` HttpTransact::set_cache_prepare_write_action_for_new_request(HttpTransact::State*) HttpTransact::HandleCacheOpenReadMiss(HttpTransact::State*) HttpSM::call_transact_and_set_next_state(void (*)(HttpTransact::State*)) HttpSM::state_api_callout(int, void*) HttpSM::state_api_callback(int, void*) TSHttpTxnReenable(tsapi_httptxn*, TSEvent) <plugin>.so INKContInternal::handle_event(int, void*) APIHook::invoke(int, void*) const HttpSM::state_api_callout(int, void*) HttpSM::state_hostdb_lookup(int, void*) ``` ## Key State at Crash ``` s->redirect_info.redirect_in_process = false s->cache_info.write_lock_state = SUCCESS s->cache_info.action = WRITE s->api_server_addr_set_retried = true s->response_error = CONNECTION_OPEN_FAILED s->transact_return_point = HandleCacheOpenReadMiss ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
