6.2.1, TS-5046, and the case of the serious segfaults

Dale Ghent Thu, 09 Feb 2017 10:32:54 -0800

Hey all;

For the past several days, I've been tracking down a crash that's happening in 
traffic_server 6.2.1 that is happening fairly regularly on our cache nodes. The 
basic pathology is: server_vc is NULL under some circumstances, which has 
caused segfaults in code paths which do not guard for this condition. This 
issue was raised last year in ticket TS-5046 and reported to be fixed with the 
release of 6.2.1:


https://issues.apache.org/jira/browse/TS-5046

The fix was in PR 1222, by backporting the fixed for a different, related issue 
reported in TS-4938, also reported as fixed in 6.2.1:

https://issues.apache.org/jira/browse/TS-4938

In the stack trace given in TS-5046, the person reported the segfault happening 
in HttpSM::tunnel_handler_server(). While their path to get there was different 
than ours, the effect was the same. In my case, the segfaults always happen 
during processing in the gzip plugin:

https://paste.ec/paste/ZJnXEDC6#Rd-AeEkmmbsqM6E4rA/WuFdcgnUrv9By+rKO6eaxWmY

As a work-around, I put Masa's bandaid fix in HttpSM::tunnel_handler_server() 
he mentioned in TS-5046. This of course avoids the null ptr deref there, but 
now it occurs at the next point down the stack:

https://paste.ec/paste/+JwWMtuB#ew-87NnW1hQeFzwOCZ5GC+93EPF45YeJut/MtZwsdro

Obviously we can continue playing whackamole and put guards in for a NULL 
server_vc in HttpServerSession::release(), however I feel that this probably 
isn't the desired route. Questions such as "Is server_vc *ever* supposed to be 
NULL?" and "Why would it be null in the first place?" persist, so I'm turning 
to more seasoned eyes here for your thoughts. At any rate, TS-5046/4938 do not 
appear to have completely addressed the situation.

/dale

signature.asc
Description: Message signed with OpenPGP

6.2.1, TS-5046, and the case of the serious segfaults

Reply via email to