[ 
https://issues.apache.org/jira/browse/MESOS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831308#comment-16831308
 ] 

Greg Mann commented on MESOS-9698:
----------------------------------

It looks like this failure occurs because two {{ReconcileOperationsMessages}} 
are sent from the master to the agent for the same operation. This leads to two 
OPERATION_DROPPED updates for the same operation; after the first one is 
processed by the master, the operation is removed. Then, when the second one is 
received, we hit [this 
block|https://github.com/apache/mesos/blob/3a0c2fa2ae338eeab292e7c0d3dae55b66f5886d/src/master/master.cpp#L8817-L8836]
 and incorrectly assume that the framework ID is SOME.

> DroppedOperationStatusUpdate test is flaky
> ------------------------------------------
>
>                 Key: MESOS-9698
>                 URL: https://issues.apache.org/jira/browse/MESOS-9698
>             Project: Mesos
>          Issue Type: Bug
>         Environment: Debian 8
>            Reporter: Andrei Budnik
>            Assignee: Greg Mann
>            Priority: Major
>              Labels: flaky-test, foundations, mesosphere, operation-feedback
>         Attachments: DroppedOperationStatusUpdate-badrun1.txt
>
>
> DroppedOperationStatusUpdate test failed with the following backtrace:
> {code:java}
> 06:50:21 mesos-tests: ../../3rdparty/stout/include/stout/option.hpp:120: T& 
> Option<T>::get() & [with T = mesos::FrameworkID]: Assertion `isSome()' failed.
> 06:50:21 *** Aborted at 1554360620 (unix time) try "date -d @1554360620" if 
> you are using GNU date ***
> 06:50:21 I0404 06:50:20.663539 16308 scheduler.cpp:847] Enqueuing event 
> OFFERS received from http://172.16.10.126:42550/master/api/v1/scheduler
> 06:50:21 I0404 06:50:20.663702 16308 scheduler.cpp:847] Enqueuing event 
> UPDATE_OPERATION_STATUS received from 
> http://172.16.10.126:42550/master/api/v1/scheduler
> 06:50:21 PC: @     0x7fa726c66067 (unknown)
> 06:50:21 *** SIGABRT (@0x6fad) received by PID 28589 (TID 0x7fa71dfc9700) 
> from PID 28589; stack trace: ***
> 06:50:21     @     0x7fa726feb890 (unknown)
> 06:50:21     @     0x7fa726c66067 (unknown)
> 06:50:21     @     0x7fa726c67448 (unknown)
> 06:50:21     @     0x7fa726c5f266 (unknown)
> 06:50:21     @     0x7fa726c5f312 (unknown)
> 06:50:21     @     0x7fa72a1be89a 
> _ZNR6OptionIN5mesos11FrameworkIDEE3getEv.part.500
> 06:50:21     @     0x7fa72a54002a 
> mesos::internal::master::Master::updateOperationStatus()
> 06:50:21     @     0x7fa72a5c583b ProtobufProcess<>::_handlerMutM<>()
> 06:50:21     @     0x7fa72a58e680 ProtobufProcess<>::consume()
> 06:50:21     @     0x7fa72a50cf04 mesos::internal::master::Master::_consume()
> 06:50:21     @     0x7fa72a52975d mesos::internal::master::Master::consume()
> 06:50:21     @     0x7fa72b60b1d3 process::ProcessManager::resume()
> 06:50:21     @     0x7fa72b610ea6 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 06:50:21     @     0x7fa7277c6970 (unknown)
> 06:50:21     @     0x7fa726fe4064 start_thread
> 06:50:21     @     0x7fa726d1962d (unknown)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to