[
https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863104#comment-15863104
]
Till Toenshoff edited comment on MESOS-2842 at 2/13/17 1:47 AM:
----------------------------------------------------------------
This is what this looks like when coming across this issue;
{noformat}
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419044 2809 master.cpp:2783] Subscribing framework integration_test
with checkpointing enabled and capabilities [ PARTITION_AWARE ]
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419072 2809 master.cpp:2861] Updating info for framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213
01:38:28.419083 2809 master.hpp:2486] Cannot update FrameworkInfo.role to '*'
for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213
01:38:28.419091 2809 master.hpp:2497] Cannot update FrameworkInfo.principal to
'alice' for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419111 2809 master.cpp:2874] Framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at
[email protected]:41805 failed over
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419245 2809 hierarchical.cpp:358] Activated framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419543 2809 master.cpp:6664] Sending 1 offers to framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at
[email protected]:46530
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: F0213
01:38:28.426944 2809 master.cpp:1446] Check failed:
metrics->frameworks.contains(principal.get())
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: ***
Check failure stack trace: ***
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b831ad google::LogMessage::Fail()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b84fdd google::LogMessage::SendToLog()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b82d9c google::LogMessage::Flush()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b858d9 google::LogMessageFatal::~LogMessageFatal()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6780453dd mesos::internal::master::Master::visit()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678af7ca1 process::ProcessManager::resume()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b00ba7
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb676f90230 (unknown)
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6767aedc5 start_thread
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6764dd73d __clone
{noformat}
So the framework re-registered using its former framework id but a new
principal and role. The result is the above crash on the master.
was (Author: tillt):
This is what this looks like when coming across this issue;
{noformat}
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419044 2809 master.cpp:2783] Subscribing framework integration_test
with checkpointing enabled and capabilities [ PARTITION_AWARE ]
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419072 2809 master.cpp:2861] Updating info for framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213
01:38:28.419083 2809 master.hpp:2486] Cannot update FrameworkInfo.role to '*'
for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213
01:38:28.419091 2809 master.hpp:2497] Cannot update FrameworkInfo.principal to
'alice' for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419111 2809 master.cpp:2874] Framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at
[email protected]:41805 failed over
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419245 2809 hierarchical.cpp:358] Activated framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213
01:38:28.419543 2809 master.cpp:6664] Sending 1 offers to framework
6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at
[email protected]:46530
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: F0213
01:38:28.426944 2809 master.cpp:1446] Check failed:
metrics->frameworks.contains(principal.get())
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: ***
Check failure stack trace: ***
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b831ad google::LogMessage::Fail()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b84fdd google::LogMessage::SendToLog()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b82d9c google::LogMessage::Flush()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b858d9 google::LogMessageFatal::~LogMessageFatal()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6780453dd mesos::internal::master::Master::visit()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678af7ca1 process::ProcessManager::resume()
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb678b00ba7
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb676f90230 (unknown)
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6767aedc5 start_thread
Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @
0x7fb6764dd73d __clone
{noformat}
> Update FrameworkInfo.principal on framework re-registration
> -----------------------------------------------------------
>
> Key: MESOS-2842
> URL: https://issues.apache.org/jira/browse/MESOS-2842
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Labels: security
>
> From the design doc:
> This is a bit involved because ‘principal’ is used for authentication and
> rate limiting.
> The authentication part is straightforward because a framework with updated
> ‘principal’ should authenticate with the new ‘principal’ before being allowed
> to re-register. The ‘authenticated’ map already gets updated when the
> framework disconnects and reconnects, so it is fine.
> For rate limiting, Master:failoverFramework() needs to be changed to update
> the principal in ‘frameworks.principals’ map and also remove the metrics for
> the old principal if there are no other frameworks with this principal
> (similar to what we do in Master::removeFramework()).
> The Master::visit() and Master::_visit() should work with the current
> semantics.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)