Gancho Tenev created TS-4161:
--------------------------------

             Summary: ProcessManager prone to stack-overflow
                 Key: TS-4161
                 URL: https://issues.apache.org/jira/browse/TS-4161
             Project: Traffic Server
          Issue Type: Bug
          Components: Manager
            Reporter: Gancho Tenev


ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big 
number of messages in a raw from the same socket. 

Since alloca() is used to allocate buffers on the stack for each message read 
from the socket, and those buffers are not released until the function returns, 
getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen 
if the message length is big enough (accidentally or on purpose).

It can be reproduced easily by setting up:
            proxy.config.lm.pserver_timeout_secs: 0
            proxy.config.lm.pserver_timeout_msecs: 0
in records.config and running ./bin/traffic_manager. 

ATS crashes with a segfault in a weird place (while trying to allocate with 
malloc()). If you inspect the core you would see that it got "stuck" in the 
loop before it crashed over-flowing the stack (kept allocating buffers on the 
stack with alloca() until it crashed).

It is worth considering replacing the alloca() with VLA (which "releases" 
memory when out of scope on each iteration of the loop) or using ats_malloc() 
which is supposedly less time-efficient but would be better to handle bigger 
messages without worrying about stack-overflow. 

IMO adding a message size limit check is a good practice especially with the 
current implementation.

If the code gets "stuck" in the while loop while reading big number of messages 
in a row from the same socket then the port configured by 
proxy.config.process_manager.mgmt_port becomes unavailable (connection 
refused). Adding a limit of messages that can be processed in a row should be a 
good idea.

I stumbled up on this while running TSQA regression tests where TSQA kept 
complaining that the management port is not available and the ATS kept crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to