[ 
https://issues.apache.org/jira/browse/MESOS-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762066#comment-16762066
 ] 

Jeff Pollard edited comment on MESOS-9555 at 2/7/19 4:28 PM:
-------------------------------------------------------------

Thanks for the links to the bugs fixed. I wasn't sure if there was a specific 
issue you had in mind that our issue might be. I'll spend some time reading 
some fixed bugs that look relevant.

I'll need to check internally to see what our schedule is with upgrading Mesos 
and will report back.

EDIT: Decided to remove the log download link. If you'd like it, let me know 
I'd be happy to email it.


was (Author: fluxx):
 

Full log for the master available here: 
[https://www.dropbox.com/s/vzoaf8pva95qs8c/mesos.log?dl=0]

Thanks for the links to the bugs fixed. I wasn't sure if there was a specific 
issue you had in mind that our issue might be. I'll spend some time reading 
some fixed bugs that look relevant.

I'll need to check internally to see what our schedule is with upgrading Mesos 
and will report back.

> Check failed: reservationScalarQuantities.contains(role)
> --------------------------------------------------------
>
>                 Key: MESOS-9555
>                 URL: https://issues.apache.org/jira/browse/MESOS-9555
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation, master
>    Affects Versions: 1.5.0
>         Environment: * Mesos 1.5
>  * {{DISTRIB_ID=Ubuntu}}
>  * {{DISTRIB_RELEASE=16.04}}
>  * {{DISTRIB_CODENAME=xenial}}
>  * {{DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"}}
>            Reporter: Jeff Pollard
>            Priority: Critical
>
> We recently upgraded our Mesos cluster from version 1.3 to 1.5, and since 
> then have been getting periodic master crashes due to this error:
> {code:java}
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: F0205 15:53:57.385118 8434 
> hierarchical.cpp:2630] Check failed: 
> reservationScalarQuantities.contains(role){code}
> Full stack trace is at the end of this issue description. When the master 
> fails, we automatically restart it and it rejoins the cluster just fine. I 
> did some initial searching and was unable to find any existing bug reports or 
> other people experiencing this issue. We run a cluster of 3 masters, and see 
> crashes on all 3 instances.
> Right before the crash, we saw a {{Removed agent:...}} log line noting that 
> it was agent 9b912afa-1ced-49db-9c85-7bc5a22ef072-S6 that was removed.
> {code:java}
> 294929:Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: I0205 
> 15:53:57.384759 8432 master.cpp:9893] Removed agent 
> 9b912afa-1ced-49db-9c85-7bc5a22ef072-S6 at slave(1)@10.0.18.78:5051 
> (10.0.18.78): the agent unregistered{code}
> I saved the full log from the master, so happy to provide more info from it, 
> or anything else about our current environment.
> Full stack trace is below.
> {code:java}
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e9170a7d 
> google::LogMessage::Fail()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e9172830 
> google::LogMessage::SendToLog()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e9170663 
> google::LogMessage::Flush()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e9173259 
> google::LogMessageFatal::~LogMessageFatal()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e8443cbd 
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::untrackReservations()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e8448fcd 
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::removeSlave()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e90c4f11 
> process::ProcessBase::consume()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e90dea4a 
> process::ProcessManager::resume()
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e90e25d6 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e6700c80 (unknown)
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e5f136ba 
> start_thread
> Feb 5 15:53:57 ip-10-0-16-140 mesos-master[8414]: @ 0x7f87e5c4941d 
> (unknown){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to