[ 
https://issues.apache.org/jira/browse/TS-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459867#comment-13459867
 ] 

Aidan McGurn commented on TS-1487:
----------------------------------

Alans initial comment:

Hmmmm. I strongly suspect there was a reason to switch the order - I know I 
have had to move things around in that area previously to get some of my 
changes to work. I'll try to take a look
                
> the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core 
> endlessly
> ---------------------------------------------------------------------------------------
>
>                 Key: TS-1487
>                 URL: https://issues.apache.org/jira/browse/TS-1487
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.2.0
>         Environment: Linux RHEL6.2
>            Reporter: Aidan McGurn
>            Priority: Critical
>         Attachments: INTD-529-RespawnCrash.patch
>
>
> We've had a serious issue whereby the TS when it crashes re-spawns/cores 
> continuously when its tries to re-start under load. I traced the issue to 
> SNMP research library (a third party lib)- They use selects and what happens 
> is the file descriptor number spikes under load after the crash as all the 
> sockets get opened at once - this causes buffer overflow in the select (which 
> their library is full of) as the fd allocated to the FD_SET is much bigger 
> than the FD_SETSIZE of 1024 (which  was a bitch to track down as the stack 
> was corrupted and gdb therefore useless). Tracing why this happened on 3.2.0 
> and not 3.0.2, I find the sequence 
> of the plugin_init has changed - On 3.0.2 the sequence was in effect  1. 
> plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously 
> been reversed on 3.2.0. In order to get our system to work in this crash case 
> , I've patched ATS to flip them around like in 3.0.2.
> i'll attach the patch we propose we need to use to get around this.
> Is this actually a bug then waiting to happen in other systems - Or was there 
> a reason to change this sequence?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to