[ 
https://issues.apache.org/jira/browse/MESOS-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778943#comment-13778943
 ] 

Timothy St. Clair commented on MESOS-304:
-----------------------------------------

This is a issue in some automated build environments like koji 
(https://fedoraproject.org/wiki/SIGs/bigdata/packaging/Mesos#Failing_Tests) 
which disable DNS.  

Another option might be to check during configure and selectively disable tests 
with a preprocessor variable. 
                
> Master should register a slave only after it confirms it can talk to the slave
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-304
>                 URL: https://issues.apache.org/jira/browse/MESOS-304
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Vinod Kone
>
> We have seen this issue from users running on EC2 and also at Twitter.
> The crux of the issue is that, the master starts offering the resources of a 
> slave as soon as it gets a Register message. If for some reason the master 
> --> slave connection is not viable (e.g. slave used its private ip address, 
> DNS failures), we end up in a loop as follows:
> --> Slave sends Register message to master
> --> Master accepts it and offers resources to the framework
> --> The slave health checks to the slave keeps failing
> --> Framework launches tasks on this slave, which would be dropped on the 
> floor
> --> After health check timeout (>60s), master disconnects the slave
> --> Slave sends a Register message again.
> --> Repeat
> One way to solve this problem is to do a 3-way handshake for registration.
> This should also be done for framework registration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to