[
https://issues.apache.org/jira/browse/MESOS-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452536#comment-16452536
]
Greg Mann commented on MESOS-8687:
----------------------------------
Indeed, it seems likely that {{_consume()}} is dispatched by one instance of
the master actor, but is actually executed on a second instance of the master
actor after master failover. As suggested by [~bennoe], perhaps we could add a
{{Clock::settle()}} immediately after the master is reset in the test.
> Check failure in `ProcessBase::_consume()`.
> -------------------------------------------
>
> Key: MESOS-8687
> URL: https://issues.apache.org/jira/browse/MESOS-8687
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Affects Versions: 1.6.0
> Environment: ec2 CentOS 7 with SSL
> Reporter: Alexander Rukletsov
> Assignee: Benjamin Mahler
> Priority: Major
> Labels: flaky-test, reliability
> Attachments: MasterAPITest.MasterFailover-with-CHECK.txt,
> MasterFailover-badrun.txt
>
>
> Observed a segfault in the {{MasterAPITest.MasterFailover}} test:
> {noformat}
> 10:59:04 I0319 10:59:04.312197 3274 master.cpp:649] Authorization enabled
> 10:59:04 F0319 10:59:04.312772 3274 owned.hpp:110] Check failed: 'get()'
> Must be non NULL
> 10:59:04 *** Check failure stack trace: ***
> 10:59:04 I0319 10:59:04.313470 3279 hierarchical.cpp:175] Initialized
> hierarchical allocator process
> 10:59:04 I0319 10:59:04.313500 3279 whitelist_watcher.cpp:77] No whitelist
> given
> 10:59:04 @ 0x7fe82d44e0cd google::LogMessage::Fail()
> 10:59:04 @ 0x7fe82d44ff1d google::LogMessage::SendToLog()
> 10:59:04 @ 0x7fe82d44dcb3 google::LogMessage::Flush()
> 10:59:04 @ 0x7fe82d450919 google::LogMessageFatal::~LogMessageFatal()
> 10:59:04 @ 0x7fe82d3cee16 google::CheckNotNull<>()
> 10:59:04 @ 0x7fe82d3b4253 process::ProcessBase::_consume()
> 10:59:04 @ 0x7fe82d3b4a66
> _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7RequestEEEE_JSG_EEEEclEv
> 10:59:04 @ 0x7fe82c39c3ca
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEEEEEclEOS3_
> 10:59:04 @ 0x7fe82d39f2c1 process::ProcessBase::consume()
> 10:59:04 @ 0x7fe82d3b84da process::ProcessManager::resume()
> 10:59:04 @ 0x7fe82d3bbf56
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 10:59:04 @ 0x7fe82d577870 execute_native_thread_routine
> 10:59:04 @ 0x7fe82a761e25 start_thread
> 10:59:04 @ 0x7fe82986334d __clone
> {noformat}
> Full test log is attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)