[
https://issues.apache.org/jira/browse/MESOS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118527#comment-16118527
]
Ilya Pronin edited comment on MESOS-7867 at 8/18/17 2:47 PM:
-------------------------------------------------------------
We can add those metrics back in {{Master::failoverFramework(Framework*, const
UPID&}} or just remove metrics removal from
{{Master::failoverFramework(Framework*, const HttpConnection&)}}.
{{Master::addFramework()}} adds those metrics regardless of the scheduler
driver type.
[~anandmazumdar], can you shepherd this please?
was (Author: ipronin):
We can add those metrics back in {{Master::failoverFramework(Framework*, const
UPID&}} or just remove metrics removal from
{{Master::failoverFramework(Framework*, const HttpConnection&)}}.
{{Master::addFramework()}} adds those metrics regardless of the scheduler
driver type.
> Master doesn't handle scheduler driver downgrade from HTTP based to PID based
> -----------------------------------------------------------------------------
>
> Key: MESOS-7867
> URL: https://issues.apache.org/jira/browse/MESOS-7867
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.3.0
> Reporter: Ilya Pronin
> Assignee: Ilya Pronin
>
> When a framework upgrades from a PID based driver to an HTTP based driver,
> master removes its per-framework-principal metrics ({{messages_received}} and
> {{messages_processed}}) in {{Master::failoverFramework}}. When the same
> framework downgrades back to a PID based driver, the master doesn't reinstate
> those metrics. This causes a crash when the master receives a message from
> the failed over framework and increments {{messages_received}} counter in
> {{Master::visit(const MessageEvent&)}}.
> {noformat}
> I0807 18:17:45.713220 19095 master.cpp:2916] Framework
> 70822e80-ca38-4470-916e-e6da073a4742-0000 (TwitterScheduler) failed over
> F0807 18:18:20.725908 19079 master.cpp:1451] Check failed:
> metrics->frameworks.contains(principal.get())
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)