See TS-1502 as well for some thoughts on coordinating traffic_cop with 
traffic_server .

On Sep 28, 2012, at 3:33 PM, "Alan M. Carroll (JIRA)" <[email protected]> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/TS-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> 
> Alan M. Carroll reassigned TS-1487:
> -----------------------------------
> 
>    Assignee: Alan M. Carroll
> 
>> the ordering of plugin_init and init_HttpProxyServer cause crashed TS to 
>> core endlessly
>> ---------------------------------------------------------------------------------------
>> 
>>                Key: TS-1487
>>                URL: https://issues.apache.org/jira/browse/TS-1487
>>            Project: Traffic Server
>>         Issue Type: Bug
>>         Components: Core
>>   Affects Versions: 3.2.0
>>        Environment: Linux RHEL6.2
>>           Reporter: Aidan McGurn
>>           Assignee: Alan M. Carroll
>>           Priority: Critical
>>        Attachments: INTD-529-RespawnCrash.patch, INTD-529-RespawnCrash.patch
>> 
>> 
>> We've had a serious issue whereby the TS when it crashes re-spawns/cores 
>> continuously when its tries to re-start under load. I traced the issue to 
>> SNMP research library (a third party lib)- They use selects and what happens 
>> is the file descriptor number spikes under load after the crash as all the 
>> sockets get opened at once - this causes buffer overflow in the select 
>> (which their library is full of) as the fd allocated to the FD_SET is much 
>> bigger than the FD_SETSIZE of 1024 (which  was a bitch to track down as the 
>> stack was corrupted and gdb therefore useless). Tracing why this happened on 
>> 3.2.0 and not 3.0.2, I find the sequence 
>> of the plugin_init has changed - On 3.0.2 the sequence was in effect  1. 
>> plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously 
>> been reversed on 3.2.0. In order to get our system to work in this crash 
>> case , I've patched ATS to flip them around like in 3.0.2.
>> i'll attach the patch we propose we need to use to get around this.
>> Is this actually a bug then waiting to happen in other systems - Or was 
>> there a reason to change this sequence?
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to