[ 
https://issues.apache.org/jira/browse/MESOS-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730455#comment-16730455
 ] 

Meng Zhu commented on MESOS-9502:
---------------------------------

[~abudnik] pointed out that this is likely caused by MESOS-6632, which I can 
confirm with the log.

When a container launch gets discarded halfway but after IOSB starts to 
redirect to the container, the IOSB will be stuck there until agent fails over 
and all fds are closed. I can confirm that stuck ISOB always comes back and 
terminates after an agent failover.

> IOswitchboard cleanup could get stuck.
> --------------------------------------
>
>                 Key: MESOS-9502
>                 URL: https://issues.apache.org/jira/browse/MESOS-9502
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.7.0
>            Reporter: Meng Zhu
>            Priority: Critical
>
> Our check container got stuck during destroy which in turned stucks the 
> parent container. It is blocked by the I/O switchboard cleanup:
> 1223 18:04:41.000000 16269 switchboard.cpp:814] Sending SIGTERM to I/O 
> switchboard server (pid: 62854) since container 
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
>  is being destroyed
> ....
> 1227 04:45:38.000000  5189 switchboard.cpp:916] I/O switchboard server 
> process for container 
> 4d4074fa-bc87-471b-8659-08e519b68e13.16d02532-675a-4acb-964d-57459ecf6b67.check-e91521a3-bf72-4ac4-8ead-3950e31cf09e
>  has terminated (status=N/A)
> Note the timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to