[
https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266456#comment-15266456
]
Marvin Frick commented on MESOS-2043:
-------------------------------------
> When a framework gets hit by this are you able to connect slaves?
Yes. I just stopped and started a slave to verify:
{code}
I0502 11:41:12.680176 24762 slave.cpp:796] New master detected at
[email protected]
:5050
I0502 11:41:12.680429 24762 slave.cpp:859] Authenticating with master
[email protected]
:5050
I0502 11:41:12.680464 24762 slave.cpp:864] Using default CRAM-MD5 authenticatee
I0502 11:41:12.680542 24762 slave.cpp:832] Detecting new master
I0502 11:41:12.680718 24758 authenticatee.cpp:97] Initializing client SASL
I0502 11:41:12.682632 24758 authenticatee.cpp:121] Creating new client SASL
connection
I0502 11:41:12.685474 24762 authenticatee.cpp:212] Received SASL authentication
mechanisms: CRAM-MD5
I0502 11:41:12.685564 24762 authenticatee.cpp:238] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0502 11:41:12.687180 24759 authenticatee.cpp:258] Received SASL authentication
step
I0502 11:41:12.692947 24759 authenticatee.cpp:298] Authentication success
I0502 11:41:12.693356 24759 slave.cpp:927] Successfully authenticated with
master [email protected]
:5050
I0502 11:41:12.695637 24763 slave.cpp:1071] Re-registered with master
[email protected]
:5050
I0502 11:41:12.695729 24759 status_update_manager.cpp:181] Resuming sending
status updates
I0502 11:41:12.695719 24763 slave.cpp:1107] Forwarding total oversubscribed
resources
I0502 11:41:12.696529 24760 slave.cpp:2341] Updated checkpointed resources from
to
{code}
We do use different credentials for slaves vs. frameworks though:
{code}
root@master1-pillartwo2:~# cat /etc/mesos/here_be_credentials
pricipalforslaves aaaabbbbbbbcccedddeeeffffggghhhhhiiiijjjjkkkkllllmmnnn
marathonprincipal
112233445566778899000112233445566778899000112233445566778899000
{code}
{code}
root@master1-pillartwo2:~# cat /etc/marathon/here_be_credentials
112233445566778899000112233445566778899000112233445566778899000
{code}
Please note that there is no newline in the above marathon credentials file,
see https://groups.google.com/forum/#!topic/marathon-framework/bwgWjsG-QFU
{code}
root@slave1-pillartwo2:~# cat /etc/mesos/here_be_credentials
pricipalforslaves aaaabbbbbbbcccedddeeeffffggghhhhhiiiijjjjkkkkllllmmnnn
{code}
> framework auth fail with timeout error and never get authenticated
> ------------------------------------------------------------------
>
> Key: MESOS-2043
> URL: https://issues.apache.org/jira/browse/MESOS-2043
> Project: Mesos
> Issue Type: Bug
> Components: master, scheduler driver, security, slave
> Affects Versions: 0.21.0
> Reporter: Bhuvan Arumugam
> Assignee: Greg Mann
> Priority: Critical
> Labels: mesosphere, security
> Fix For: 0.29.0
>
> Attachments: aurora-scheduler.20141104-1606-1706.log, master.log,
> mesos-master.20141104-1606-1706.log, slave.log
>
>
> I'm facing this issue in master as of
> https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
> As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm
> running 1 master and 1 scheduler (aurora). The framework authentication fail
> due to time out:
> error on mesos master:
> {code}
> I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5
> authenticator
> I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL
> connection
> W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out
> W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083:
> Authentication discarded
> {code}
> scheduler error:
> {code}
> I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master
> master@MASTER_IP:PORT
> I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL
> connection
> I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL
> authentication mechanisms: CRAM-MD5
> I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
> I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master
> master@MASTER_IP:PORT: Authentication discarded
> {code}
> Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} &
> {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is
> trying to authenticate and fail.
> {code}
> W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate
> scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to
> communicate with authenticatee
> I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication
> request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> because authentication is still in progress
> {code}
> Restarting master and scheduler didn't fix it.
> This particular issue happen with 1 master and 1 scheduler after MESOS-1866
> is fixed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)