[
https://issues.apache.org/jira/browse/TS-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713019#comment-13713019
]
Alan M. Carroll commented on TS-1487:
-------------------------------------
Commit comments
1) Revert initialization order to as in 3.0, plugins are initialized very
early. This part solves TS-2035 which had a race condition where the proxy
start up code depended on a side effect of the plugin init logic.
2) Add lifecycle hooks to keep the functionality requirement that drove the
original change in order. See ts.h.in for details, look for TSLifecycleHookID.
3) Added new configuration option
proxy.config.http.wait_for_cache INT 0
If this is set to a non-zero value then the listen/accept calls on the proxy
ports will be delayed until the cache initialization is finished. If run
directly, the socket opens will be delayed as well, but if run via
traffic_manager the sockets will be open before the traffic_server process
starts. If this is still a problem it may be reasonable to tweak
traffic_manager to not open the sockets if this is set. In that case,
traffic_server will open the sockets as needed.
4) Removed the "no-api" build option. It was annoying when I was working on
this and considered to be useless so I cleared it out.
> the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core
> endlessly
> ---------------------------------------------------------------------------------------
>
> Key: TS-1487
> URL: https://issues.apache.org/jira/browse/TS-1487
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Affects Versions: 3.2.0
> Environment: Linux RHEL6.2
> Reporter: Aidan McGurn
> Assignee: Alan M. Carroll
> Priority: Critical
> Labels: A
> Fix For: 3.3.5
>
> Attachments: INTD-529-RespawnCrash.patch,
> INTD-529-RespawnCrash.patch, ts-1487.diff
>
>
> We've had a serious issue whereby the TS when it crashes re-spawns/cores
> continuously when its tries to re-start under load. I traced the issue to
> SNMP research library (a third party lib)- They use selects and what happens
> is the file descriptor number spikes under load after the crash as all the
> sockets get opened at once - this causes buffer overflow in the select (which
> their library is full of) as the fd allocated to the FD_SET is much bigger
> than the FD_SETSIZE of 1024 (which was a bitch to track down as the stack
> was corrupted and gdb therefore useless). Tracing why this happened on 3.2.0
> and not 3.0.2, I find the sequence
> of the plugin_init has changed - On 3.0.2 the sequence was in effect 1.
> plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously
> been reversed on 3.2.0. In order to get our system to work in this crash case
> , I've patched ATS to flip them around like in 3.0.2.
> i'll attach the patch we propose we need to use to get around this.
> Is this actually a bug then waiting to happen in other systems - Or was there
> a reason to change this sequence?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira