[ 
https://issues.apache.org/jira/browse/TS-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461901#comment-13461901
 ] 

Leif Hedstrom commented on TS-1487:
-----------------------------------

As for Yongming's comment: remember that this is done by design. I'm definitely 
for making it possible to startup without requiring the cache to be made 
available, but it needs to be optional. The idea with the current design is 
that in a single server (or small number of servers) setup, it's better to 
proxy than to refuse connections until the cache is up.
                
> the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core 
> endlessly
> ---------------------------------------------------------------------------------------
>
>                 Key: TS-1487
>                 URL: https://issues.apache.org/jira/browse/TS-1487
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.2.0
>         Environment: Linux RHEL6.2
>            Reporter: Aidan McGurn
>            Priority: Critical
>         Attachments: INTD-529-RespawnCrash.patch
>
>
> We've had a serious issue whereby the TS when it crashes re-spawns/cores 
> continuously when its tries to re-start under load. I traced the issue to 
> SNMP research library (a third party lib)- They use selects and what happens 
> is the file descriptor number spikes under load after the crash as all the 
> sockets get opened at once - this causes buffer overflow in the select (which 
> their library is full of) as the fd allocated to the FD_SET is much bigger 
> than the FD_SETSIZE of 1024 (which  was a bitch to track down as the stack 
> was corrupted and gdb therefore useless). Tracing why this happened on 3.2.0 
> and not 3.0.2, I find the sequence 
> of the plugin_init has changed - On 3.0.2 the sequence was in effect  1. 
> plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously 
> been reversed on 3.2.0. In order to get our system to work in this crash case 
> , I've patched ATS to flip them around like in 3.0.2.
> i'll attach the patch we propose we need to use to get around this.
> Is this actually a bug then waiting to happen in other systems - Or was there 
> a reason to change this sequence?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to