[ 
https://issues.apache.org/jira/browse/MESOS-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6337:
------------------------------
    Fix Version/s:     (was: 1.1.0)

> Nested containers getting killed before network isolation can be applied to 
> them.
> ---------------------------------------------------------------------------------
>
>                 Key: MESOS-6337
>                 URL: https://issues.apache.org/jira/browse/MESOS-6337
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>         Environment: Linux
>            Reporter: Avinash Sridharan
>            Assignee: Gilbert Song
>              Labels: mesosphere
>
> Seeing this odd behavior in one of our clusters:
> ```
> http.cpp:1948] Failed to launch nested container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to seed container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to setup hostname and network files: Failed to enter 
> the mount namespace of pid 21591: Pid 21591 does not exist
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894485 
> 31531 containerizer.cpp:1931] Destroying container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e in 
> ISOLATING state
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.894439 
> 31531 containerizer.cpp:2300] Container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e has 
> exited
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.854456 
> 31534 systemd.cpp:96] Assigned child process '21591' to 
> 'mesos_executors.slice'
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831861 
> 21580 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831526 
> 21580 openssl.cpp:432] Will only verify peer certificate if presented!
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831521 
> 21580 openssl.cpp:426] Will not verify peer certificate!
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: I1007 02:05:55.831511 
> 21580 openssl.cpp:421] CA directory path unspecified! NOTE: Set CA directory 
> path with LIBPROCESS_SSL_CA_DIR=<dirpath>
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.831405 
> 21580 openssl.cpp:399] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: WARNING: Logging before 
> InitGoogleLogging() is written to STDERR
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: W1007 02:05:55.828413 
> 21581 process.cpp:882] Failed SSL connections will be downgraded to a non-SSL 
> socket
> Oct 07 02:05:55 ip-10-10-0-207 mesos-agent[31520]: NOTE: Set 
> LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
> ```
> The above log is "reverse" chronological order, so please read it bottom up.
> The relevant log is:
> ```
> http.cpp:1948] Failed to launch nested container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to seed container 
> cb92634b-42b3-40f3-94f7-609f89a362bc.46d884e4-d0eb-4572-be1d-24414df7cb2e: 
> Collect failed: Failed to setup hostname and network files: Failed to enter 
> the mount namespace of pid 21591: Pid 21591 does not exist
> ```
> Looks like the nested container failed to launch because the `isolate` call 
> to the `network/cni` isolator failed. Seems like when the isolator received 
> the `isolate` call the PID for the nested container has already exited and it 
> couldn't enter its mount namespace to setup the network files. 
> The odd thing here is that the nested container would have been frozen, and 
> hence was not running, so not sure what killed the nested container. My 
> suspicion falls on systemd, since I also see this log message:
> ```
> Oct 07 18:02:31 ip-10-10-0-207 mesos-agent[31520]: I1007 18:02:31.473656 
> 31532 systemd.cpp:96] Assigned child process '1596' to 'mesos_executors.slice'
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to