[jira] [Commented] (TS-4161) ProcessManager prone to stack-overflow

ASF GitHub Bot (JIRA) Fri, 04 Mar 2016 07:17:22 -0800

    [ 
https://issues.apache.org/jira/browse/TS-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180000#comment-15180000
 ]


ASF GitHub Bot commented on TS-4161:
------------------------------------

Github user gtenev commented on the pull request:

    https://github.com/apache/trafficserver/pull/496#issuecomment-192317642
  
    Sure, will remove the `malloc` failure check. We could increase the limit 
much higher or remove it altogether.
    
    With the previous code in the really misconfigured corner case described in 
TS-4161 in my environment the traffic_manager pumped non-stop 6628 messages 
1986000B total for 1.85s until it overflowed the stack and crashed. 
    
    It kept enqueuing messages but never got a chance to process them. Since 
the current implementation has no flow control it seemed like a good idea to 
put a limit to give a chance to the processing to catch up (if possible) or to 
at least pause the ever-growing allocation.
    
    But this was a misconfigured corner case, wonder how the "heaviest normal" 
usage can be measured to set a reasonable limit or if I should remove it 
altogether (open to suggestions).


> ProcessManager prone to stack-overflow
> --------------------------------------
>
>                 Key: TS-4161
>                 URL: https://issues.apache.org/jira/browse/TS-4161
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Manager
>            Reporter: Gancho Tenev
>            Assignee: Gancho Tenev
>            Priority: Blocker
>              Labels: crash
>             Fix For: 6.2.0
>
>
> ProcessManager::pollLMConnection() can get "stuck" in a loop while handling 
> big number of messages in a row from the same socket. 
> Since alloca() is used to allocate buffers on the stack for each message read 
> from the socket, and those buffers are not released until the function 
> returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same 
> could happen if the message length is big enough (accidentally or on purpose).
> It can be reproduced easily by setting up:
>             proxy.config.lm.pserver_timeout_secs: 0
>             proxy.config.lm.pserver_timeout_msecs: 0
> in records.config and running ./bin/traffic_manager. 
> ATS crashes with a segfault in a weird place (while trying to allocate with 
> malloc()). If you inspect the core you would see that it got "stuck" in the 
> loop before it crashed over-flowing the stack (kept allocating buffers on the 
> stack with alloca() until it crashed).
> It is worth considering replacing the alloca() with VLA (which "releases" 
> memory when out of scope on each iteration of the loop) or using ats_malloc() 
> which is supposedly less time-efficient but would be better to handle bigger 
> messages without worrying about stack-overflow. 
> IMO adding a message size limit check is a good practice especially with the 
> current implementation.
> If the code gets "stuck" in the while loop while reading big number of 
> messages in a row from the same socket then the port configured by 
> proxy.config.process_manager.mgmt_port becomes unavailable (connection 
> refused). Adding a limit of messages that can be processed in a row should be 
> a good idea.
> I stumbled up on this while running TSQA regression tests where TSQA kept 
> complaining that the management port is not available and the ATS kept 
> crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-4161) ProcessManager prone to stack-overflow

Reply via email to