[ https://issues.apache.org/jira/browse/MESOS-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222439#comment-17222439 ]
Andrei Sekretenko commented on MESOS-10194: ------------------------------------------- >From the logs, it looks like this one (and likely also MESOS-10188) is caused >by an attempt to untrack the resources used by the executor when the framework >is unknown, which is called from here: https://github.com/apache/mesos/blob/c1e716054d8dead61074c8619ffa7b33f3064152/src/master/master.cpp#L8503 [~Jerome Soussens] Maybe you can clarify what happened to the framework (cannot find this in the log): was it just disconnected or torn down? If the latter, then it probably explains how come the framework is unknown. If my understanding is correct, this means that the issue has been introduced with the quota usage metrics in 1.10.0. It is rather weird that existing tests seem not to cover the case when `ExitedExecutor` is somehow received after teardown. The fix is going to be relatively straightforward, albeit the test might be not that straightforward... > Mesos master failure "Check failed: 'get_(role)' Must be SOME" > -------------------------------------------------------------- > > Key: MESOS-10194 > URL: https://issues.apache.org/jira/browse/MESOS-10194 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: Jerome Soussens > Priority: Critical > Attachments: log_mesos_crash_role_13102020.txt > > > > *Impact* : mesos-master crash with log : > {code:java} > hierarchical.cpp:460] Check failed: 'get_(role)' Must be SOME > {code} > *Possible scenario :* > A framework, using a specific role, is stopped. More or less at the same > time, some remaining task status for this framework comes to the master from > the executor. But the roles is no more listed. > -- This message was sent by Atlassian Jira (v8.3.4#803005)