[
https://issues.apache.org/jira/browse/MESOS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730063#comment-16730063
]
Meng Zhu commented on MESOS-9502:
---------------------------------
[~jieyu] mentioned that one possibility is pid reuse. When the io switchboard
terminates and agent restarts, the old pid for io switchboard will be reaped by
init. The agent will reap on the old pid after failover, in common case, this
will return None() immediately. However, in the corner case, if the pid is
reused, the agent can get stuck.
> IOswitchboard cleanup could get stuck.
> --------------------------------------
>
> Key: MESOS-9502
> URL: https://issues.apache.org/jira/browse/MESOS-9502
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Affects Versions: 1.7.0
> Reporter: Meng Zhu
> Priority: Critical
>
> Our check container got stuck during destroy which in turned stucks the
> parent container. It is blocked by the I/O switchboard cleanup:
> 1223 18:04:41.000000 16269 switchboard.cpp:814] Sending SIGTERM to I/O
> switchboard server (pid: 62854) since container
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
> is being destroyed
> ....
> 1227 04:45:38.000000 5189 switchboard.cpp:916] I/O switchboard server
> process for container
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
> has terminated (status=N/A)
> Note the timestamp.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)