[ https://issues.apache.org/jira/browse/MESOS-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821377#comment-16821377 ]
Greg Mann commented on MESOS-9609: ---------------------------------- Looks like {{Master::_markUnreachable()}} is executed in a continuation after we perform a registry operation, but we assume that the framework still exists when we synchronously invoke {{__removeSlave()}}. the framework could be removed in between {{markUnreachable()}} and {{_markUnreachable()}}. https://github.com/apache/mesos/blob/e2b95cdf26844c6c32aa4e44c3582b33570bd330/src/master/master.cpp#L9147-L9156 > Master check failure when marking agent unreachable > --------------------------------------------------- > > Key: MESOS-9609 > URL: https://issues.apache.org/jira/browse/MESOS-9609 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.5.0 > Reporter: Greg Mann > Assignee: Greg Mann > Priority: Critical > Labels: foundations, mesosphere > > {code} > Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815433 13 > http.cpp:1185] HTTP POST for /master/api/v1/scheduler from 10.142.0.5:55133 > Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815588 13 > master.cpp:5467] Processing DECLINE call for offers: [ > 5e57f633-a69c-4009-b773-990b4b8984ad-O58323 ] for framework > 5e57f633-a69c-4009-b7 > Mar 11 10:04:33 research docker[4503]: I0311 10:04:33.815693 13 > master.cpp:10703] Removing offer 5e57f633-a69c-4009-b773-990b4b8984ad-O58323 > Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820142 10 > master.cpp:8227] Marking agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49 at > slave(1)@10.142.0.10:5051 (tf-mesos-agent-t7c8.c.bitcoin-engi > Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820367 10 > registrar.cpp:495] Applied 1 operations in 86528ns; attempting to update the > registry > Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820572 10 > registrar.cpp:552] Successfully updated the registry in 175872ns > Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820642 11 > master.cpp:8275] Marked agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49 at > slave(1)@10.142.0.10:5051 (tf-mesos-agent-t7c8.c.bitcoin-engin > Mar 11 10:04:35 research docker[4503]: I0311 10:04:35.820957 9 > hierarchical.cpp:609] Removed agent 5e57f633-a69c-4009-b773-990b4b8984ad-S49 > Mar 11 10:04:35 research docker[4503]: F0311 10:04:35.851961 11 > master.cpp:10018] Check failed: 'framework' Must be non NULL > Mar 11 10:04:35 research docker[4503]: *** Check failure stack trace: *** > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6044a7d > google::LogMessage::Fail() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6046830 > google::LogMessage::SendToLog() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6044663 > google::LogMessage::Flush() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6047259 > google::LogMessageFatal::~LogMessageFatal() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5258e14 > google::CheckNotNull<>() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c521dfc8 > mesos::internal::master::Master::__removeSlave() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c521f1a2 > mesos::internal::master::Master::_markUnreachable() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5f98f11 > process::ProcessBase::consume() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5fb2a4a > process::ProcessManager::resume() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5fb65d6 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c35d4c80 (unknown) > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2de76ba start_thread > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2b1d41d (unknown) > Mar 11 10:04:36 research docker[4503]: *** Aborted at 1520762676 (unix time) > try "date -d @1520762676" if you are using GNU date *** > Mar 11 10:04:36 research docker[4503]: PC: @ 0x7f96c2a4d196 (unknown) > Mar 11 10:04:36 research docker[4503]: *** SIGSEGV (@0x0) received by PID 1 > (TID 0x7f96b986d700) from PID 0; stack trace: *** > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2df1390 (unknown) > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2a4d196 (unknown) > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c604ce2c > google::DumpStackTraceAndExit() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6044a7d > google::LogMessage::Fail() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6046830 > google::LogMessage::SendToLog() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6044663 > google::LogMessage::Flush() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c6047259 > google::LogMessageFatal::~LogMessageFatal() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5258e14 > google::CheckNotNull<>() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c521dfc8 > mesos::internal::master::Master::__removeSlave() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c521f1a2 > mesos::internal::master::Master::_markUnreachable() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5f98f11 > process::ProcessBase::consume() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5fb2a4a > process::ProcessManager::resume() > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c5fb65d6 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c35d4c80 (unknown) > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2de76ba start_thread > Mar 11 10:04:36 research docker[4503]: @ 0x7f96c2b1d41d (unknown) > Mar 11 10:04:38 research systemd[1]: mesos-master2.service: main process > exited, code=exited, status=139/n/a > Mar 11 10:04:38 research docker[18886]: mesos-master > Mar 11 10:04:38 research systemd[1]: Unit mesos-master2.service entered > failed state. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)