[
https://issues.apache.org/jira/browse/MESOS-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jojy Varghese updated MESOS-3588:
---------------------------------
Assignee: (was: Jojy Varghese)
> Port mapping isolator check failed: createQdisc.get()
> -----------------------------------------------------
>
> Key: MESOS-3588
> URL: https://issues.apache.org/jira/browse/MESOS-3588
> Project: Mesos
> Issue Type: Bug
> Reporter: Paul Brett
>
> Container creation is failing occasionally due to the required name already
> existing, e.g:
> {code}
> F1005 13:25:04.331053 48582 port_mapping.cpp:2245] Check failed:
> createQdisc.get()
> *** Check failure stack trace: ***
>
> @ 0x7f3b5c3b668d google::LogMessage::Fail()
>
> @ 0x7f3b5c3b84d4 google::LogMessage::SendToLog()
>
> @ 0x7f3b5c3b627c google::LogMessage::Flush()
>
> @ 0x7f3b5c3b8dc9 google::LogMessageFatal::~LogMessageFatal()
>
> @ 0x7f3b5c0bdc8c
> mesos::internal::slave::PortMappingIsolatorProcess::isolate()
> @ 0x7f3b5bf28fd6
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f3b5c3690b1 process::ProcessManager::resume()
>
> @ 0x7f3b5c3693af process::internal::schedule()
>
> @ 0x7f3b5c478cd0 execute_native_thread_routine
>
> @ 0x7f3b5b14283d start_thread
>
> @ 0x7f3b5abb7fdd clone
>
> /usr/local/bin/mesos-slave.sh: line 102: 48575 Aborted (core
> dumped) $debug /usr/local/sbin/mesos-slave "${MESOS_FLAGS[@]}"
> Slave Exit Status: 134
>
> {code}
>
> It appears the there are valid circumstances under which the kernel can
> reallocate the namespace PID before the containers external interface
> (mesos_nnnnn) has been destroyed.
> {code}
> 2236 // Prepare the ingress queueing disciplines on veth.
>
> 2237 Try<bool> createQdisc = ingress::create(veth(pid));
>
> 2238 if (createQdisc.isError()) {
>
> 2239 return Failure(
>
> 2240 "Failed to create the ingress qdisc on " + veth(pid) +
>
> 2241 ": " + createQdisc.error());
>
> 2242 }
>
> 2243
>
> 2244 // Veth device should exist since we just created it.
>
> 2245 CHECK(createQdisc.get());
> {code}
> We should check for test for link already exists errors in port mapping (e.g.
> link::create returns false) and fail the container creation rather than
> killing the slave.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)