[
https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122149#comment-15122149
]
Neil Conway commented on MESOS-703:
-----------------------------------
I'm skeptical that we're taking the right approach to solving this problem. To
me, there are two entirely orthogonal things going on here:
1. A framework reregisters with the master, either because of master failover
or framework failover (which might happen due to framework
upgrades/redeployment)
2. A framework wants to change something in its {{FrameworkInfo}}
A framework could easily want to do #2 without needing #1 -- in fact, some
kinds of {{FrameworkInfo}} changes will be quite complex, requiring that the
previous cluster state (e.g., running tasks) that depend on the old
{{FrameworkInfo}} be drained before the new {{FrameworkInfo}} can be used.
Conversely, most of the time that a framework reregisters it probably _doesn't_
want to change its {{FrameworkInfo}}, so silently allowing such a change would
hide bugs.
I think we should consider adopting this behavior:
1. Framework reregistration is rejected if the framework presents a different
{{FrameworkInfo}}
2. We introduce a separate construct for allowing a framework to modify the
{{FrameworkInfo}} of an active session
> master fails to respect updated FrameworkInfo when the framework scheduler
> restarts
> -----------------------------------------------------------------------------------
>
> Key: MESOS-703
> URL: https://issues.apache.org/jira/browse/MESOS-703
> Project: Mesos
> Issue Type: Epic
> Components: master
> Affects Versions: 0.14.0
> Environment: ubuntu 13.04, mesos 0.14.0-rc3
> Reporter: Jordan Curzon
> Labels: gsoc, gsoc2015, mentor, mesosphere, twitter
>
> When I first ran marathon it was running as a personal user and registered
> with mesos-master as such due to putting an empty string in the user field.
> When I restarted marathon as "nobody", tasks were still being run as the
> personal user which didn't exist on the slaves. I know marathon was trying to
> send a FrameworkInfo with nobody listed as the user because I hard coded it
> in. The tasks wouldn't run as "nobody" until I restarted the mesos-master.
> Each time I restarted the marathon framework, it reregistered with
> mesos-master and mesos-master wrote to the logs that it detected a failover
> because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an
> updated FrameworkInfo when the scheduler re-registers?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)