[jira] [Commented] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

Neil Conway (JIRA) Thu, 28 Jan 2016 11:26:35 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122149#comment-15122149
 ]


Neil Conway commented on MESOS-703:
-----------------------------------

I'm skeptical that we're taking the right approach to solving this problem. To 
me, there are two entirely orthogonal things going on here:

1. A framework reregisters with the master, either because of master failover 
or framework failover (which might happen due to framework 
upgrades/redeployment)
2. A framework wants to change something in its {{FrameworkInfo}}

A framework could easily want to do #2 without needing #1 -- in fact, some 
kinds of {{FrameworkInfo}} changes will be quite complex, requiring that the 
previous cluster state (e.g., running tasks) that depend on the old 
{{FrameworkInfo}} be drained before the new {{FrameworkInfo}} can be used. 
Conversely, most of the time that a framework reregisters it probably _doesn't_ 
want to change its {{FrameworkInfo}}, so silently allowing such a change would 
hide bugs.

I think we should consider adopting this behavior:

1. Framework reregistration is rejected if the framework presents a different 
{{FrameworkInfo}}
2. We introduce a separate construct for allowing a framework to modify the 
{{FrameworkInfo}} of an active session

> master fails to respect updated FrameworkInfo when the framework scheduler 
> restarts
> -----------------------------------------------------------------------------------
>
>                 Key: MESOS-703
>                 URL: https://issues.apache.org/jira/browse/MESOS-703
>             Project: Mesos
>          Issue Type: Epic
>          Components: master
>    Affects Versions: 0.14.0
>         Environment: ubuntu 13.04, mesos 0.14.0-rc3
>            Reporter: Jordan Curzon
>              Labels: gsoc, gsoc2015, mentor, mesosphere, twitter
>
> When I first ran marathon it was running as a personal user and registered 
> with mesos-master as such due to putting an empty string in the user field. 
> When I restarted marathon as "nobody", tasks were still being run as the 
> personal user which didn't exist on the slaves. I know marathon was trying to 
> send a FrameworkInfo with nobody listed as the user because I hard coded it 
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. 
> Each time I restarted the marathon framework, it reregistered with 
> mesos-master and mesos-master wrote to the logs that it detected a failover 
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an 
> updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

Reply via email to