[
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154736#comment-14154736
]
Victor edited comment on TS-3104 at 10/1/14 12:21 PM:
------------------------------------------------------
When the issue was reproduced one could see it in syslog (journalctl): numerous
messages "unable to retrieve manager_binary". After applying the attached
patches the issue was gone, the processes were restarted correctly by
traffic_cop. The following tests were made:
* kill `pgrep traffic_manager`
* kill -9 `pgrep traffic_manager`
* kill `pgrep traffic_server`
* kill -9 `pgrep traffic_server`
* kill `pgrep traffic_manager`; kill `pgrep traffic_server`
* kill -9 `pgrep traffic_manager`; kill -9 `pgrep traffic_server`
In all cases both manager and traffic_server were restarted correctly, no
endless loop of traffic_cop trying to restart manager was seen.
was (Author: vleschuk):
Whe the issue was reproduced one could see it in syslog (journalctl): numerous
messages "unable to retrieve manager_binary". After applying the attached
patches the issue was gone, the processes were restarted correctly by
traffic_cop. The following tests were made:
* kill `pgrep traffic_manager`
* kill -9 `pgrep traffic_manager`
* kill `pgrep traffic_server`
* kill -9 `pgrep traffic_server`
* kill `pgrep traffic_manager`; kill `pgrep traffic_server`
* kill -9 `pgrep traffic_manager`; kill -9 `pgrep traffic_server`
In all cases both manager and traffic_server were restarted correctly, no
endless loop of traffic_cop trying to restart manager was seen.
> traffic_cop can't restart traffic_manager properly
> --------------------------------------------------
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
> Issue Type: Bug
> Components: Cop
> Reporter: Victor
> Attachments: ts-0022-fix-lockfile-killgroup.patch,
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of
> failure and this fact leads to constant attempts to connect to manager using
> socket id == -1.
> I have prepared patches for both issues. Please kindly take a look at them
> and let me know your thoughts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)