[
https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185910#comment-15185910
]
Kevin Cox commented on MESOS-2043:
----------------------------------
I would just like to say that this issue is a real pain to deal with. I ran
into this yesterday and waited over an hour and the issue still occured, the
slave only reconnected after trying to connect it again the next morning. This
makes me terrified to restart any of my slaves and I have to babysit them while
they are connection to mesos. This is the exact opposite of what I want. I want
to be able to spin up and down nodes without worrying about them never being
about to connect to mesos.
On another note I am very happy to test and review patches.
> framework auth fail with timeout error and never get authenticated
> ------------------------------------------------------------------
>
> Key: MESOS-2043
> URL: https://issues.apache.org/jira/browse/MESOS-2043
> Project: Mesos
> Issue Type: Bug
> Components: master, scheduler driver, security, slave
> Affects Versions: 0.21.0
> Reporter: Bhuvan Arumugam
> Priority: Critical
> Labels: mesosphere, security
> Attachments: aurora-scheduler.20141104-1606-1706.log,
> mesos-master.20141104-1606-1706.log
>
>
> I'm facing this issue in master as of
> https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
> As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm
> running 1 master and 1 scheduler (aurora). The framework authentication fail
> due to time out:
> error on mesos master:
> {code}
> I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5
> authenticator
> I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL
> connection
> W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out
> W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate
> scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083:
> Authentication discarded
> {code}
> scheduler error:
> {code}
> I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master
> master@MASTER_IP:PORT
> I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL
> connection
> I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL
> authentication mechanisms: CRAM-MD5
> I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate
> with mechanism 'CRAM-MD5'
> W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
> I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master
> master@MASTER_IP:PORT: Authentication discarded
> {code}
> Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} &
> {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is
> trying to authenticate and fail.
> {code}
> W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate
> scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to
> communicate with authenticatee
> I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication
> request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> because authentication is still in progress
> {code}
> Restarting master and scheduler didn't fix it.
> This particular issue happen with 1 master and 1 scheduler after MESOS-1866
> is fixed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)