[ 
https://issues.apache.org/jira/browse/MESOS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730063#comment-16730063
 ] 

Meng Zhu commented on MESOS-9502:
---------------------------------

[~jieyu] mentioned that one possibility is pid reuse. When the io switchboard 
terminates and agent restarts, the old pid for io switchboard will be reaped by 
init. The agent will reap on the old pid after failover, in common case, this 
will return None() immediately. However, in the corner case, if the pid is 
reused, the agent can get stuck.

> IOswitchboard cleanup could get stuck.
> --------------------------------------
>
>                 Key: MESOS-9502
>                 URL: https://issues.apache.org/jira/browse/MESOS-9502
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.7.0
>            Reporter: Meng Zhu
>            Priority: Critical
>
> Our check container got stuck during destroy which in turned stucks the 
> parent container. It is blocked by the I/O switchboard cleanup:
> 1223 18:04:41.000000 16269 switchboard.cpp:814] Sending SIGTERM to I/O 
> switchboard server (pid: 62854) since container 
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
>  is being destroyed
> ....
> 1227 04:45:38.000000  5189 switchboard.cpp:916] I/O switchboard server 
> process for container 
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
>  has terminated (status=N/A)
> Note the timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to