sudheerv edited a comment on issue #7096:
URL: https://github.com/apache/trafficserver/issues/7096#issuecomment-707689651


   After over 2 months of debugging, finally found the root cause of the issue. 
The issue is mainly caused by changing the (read) mutex from the server vc 
before acquiring from and releasing to the pool. The net i/o read (including 
EOS) is meant to be synchronized with the read vio mutex (which is the session 
pool mutex before acquiring from the pool and the HttpTunnel’s mutex before 
releasing to the pool) but in the process of disabling the i/o, the mutex is 
changed by calling do_io_read() with null continuation. This lets in a net read 
and also makes the “closed” flag checking unreliable in the net_read_io() 
corrupting the ssl or other heap memory. 
   
   Reproduced the behavior by tweaking the mutex windows and tested a fix that 
by tightening the windows and ensuring the mutex is preserved while the session 
acquire/release process is completed. 
   
   These issues are mostly specific to using global session sharing and for 
some reason having a lot of transform plugins acting on the server vc, seems to 
expose the race condition more. 
   
   In the process of debugging this, added a couple of mechanisms that helped - 
   1) tracking a VC event/callback history much like the SM history
   2) ssl -> vc book keeping to catch and prevent double deletes (this has 
caught a couple of bugs doing double deletes)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to