[ https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863104#comment-15863104 ]
Till Toenshoff commented on MESOS-2842: --------------------------------------- This is what this looks like when coming across this issue; {noformat} Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213 01:38:28.419044 2809 master.cpp:2783] Subscribing framework integration_test with checkpointing enabled and capabilities [ PARTITION_AWARE ] Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213 01:38:28.419072 2809 master.cpp:2861] Updating info for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009 Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213 01:38:28.419083 2809 master.hpp:2486] Cannot update FrameworkInfo.role to '*' for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703 Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: W0213 01:38:28.419091 2809 master.hpp:2497] Cannot update FrameworkInfo.principal to 'alice' for framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009. Check MESOS-703 Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213 01:38:28.419111 2809 master.cpp:2874] Framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at scheduler-188c0a58-9b44-4e2b-b133-a7c15b37fc55@127.0.0.1:41805 failed over Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213 01:38:28.419245 2809 hierarchical.cpp:358] Activated framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009 Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: I0213 01:38:28.419543 2809 master.cpp:6664] Sending 1 offers to framework 6aec32bf-cd60-4fa1-9992-f35af104f423-0009 (integration_test) at scheduler-7fff5d25-a121-48bf-8849-1948b161d729@127.0.0.1:46530 Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: F0213 01:38:28.426944 2809 master.cpp:1446] Check failed: metrics->frameworks.contains(principal.get()) Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: *** Check failure stack trace: *** Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678b831ad google::LogMessage::Fail() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678b84fdd google::LogMessage::SendToLog() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678b82d9c google::LogMessage::Flush() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678b858d9 google::LogMessageFatal::~LogMessageFatal() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb6780453dd mesos::internal::master::Master::visit() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678af7ca1 process::ProcessManager::resume() Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb678b00ba7 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb676f90230 (unknown) Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb6767aedc5 start_thread Feb 13 01:38:28 test-277bcd0b-fe0e-468a-a9b5-ee624538ac4b mesos-master: @ 0x7fb6764dd73d __clone {noformat} > Update FrameworkInfo.principal on framework re-registration > ----------------------------------------------------------- > > Key: MESOS-2842 > URL: https://issues.apache.org/jira/browse/MESOS-2842 > Project: Mesos > Issue Type: Bug > Reporter: Vinod Kone > Labels: security > > From the design doc: > This is a bit involved because ‘principal’ is used for authentication and > rate limiting. > The authentication part is straightforward because a framework with updated > ‘principal’ should authenticate with the new ‘principal’ before being allowed > to re-register. The ‘authenticated’ map already gets updated when the > framework disconnects and reconnects, so it is fine. > For rate limiting, Master:failoverFramework() needs to be changed to update > the principal in ‘frameworks.principals’ map and also remove the metrics for > the old principal if there are no other frameworks with this principal > (similar to what we do in Master::removeFramework()). > The Master::visit() and Master::_visit() should work with the current > semantics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)