shinrich opened a new pull request #6889: URL: https://github.com/apache/trafficserver/pull/6889
A replacement for PR #6732 plus some additional fixes suggested by @masaori335. The issue we have been tracking down for the last few months on ATS9 is dealing with "stale" client_vc pointers. The first approach taken in PR #6469 was to just look for still set client_vc's after the client session do_io_close and then close it. However that approach would sometimes touch a client_vc object that had already been freed. With ASAN builds @bneradt found a number of use-after-free cases. Then we took this approach of making sure that while a HttpSM or client session is still alive and referencing the client_vc, the read and write vio's should reference either the HttpSM, session or some other layer 7 object. The theory being that we were getting stale client_vc references because net events like error, eos, timeout would occur while the write/read vio was pointing to a null continuation. We have been running this logic in our ATS9 deployments since late April and our number of crashes has been minimal and due to other issues. @masaori335 pointed out that we should be able to close the netvc object at the point where the session do_io_close occurs. Making that change and running double*/openclose* with 100's of transactions shows that this is safe. Also extended the h2spec timeout from the default 2 seconds. Given the timing vagaries of the CI environment, giving more timeout slack seemed better. With all the debugging on, I was seeing intermittent failures in a variety of cases. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
