[jira] [Commented] (MESOS-165) Slaves die after initial registration with master with "Network is unreachable" error

Jessica J (Commented) (JIRA) Fri, 23 Mar 2012 05:29:59 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236541#comment-13236541
 ]


Jessica J commented on MESOS-165:
---------------------------------

Thanks--this works. (Assuming my masters and slaves have different mesos-env.sh 
so they don't try to bind to the same LIBPROCESS_IP. Please verify that this 
should be the case.)
                
> Slaves die after initial registration with master with "Network is 
> unreachable" error
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-165
>                 URL: https://issues.apache.org/jira/browse/MESOS-165
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, slave
>         Environment: Scientific Linux 6.2 internal cluster
>            Reporter: Jessica J
>            Assignee: Charles Reiss
>            Priority: Blocker
>
> I am using a cluster in which only the master is externally accessible, so 
> when I start the master, I set --ip to one of its internal IP addresses so 
> that it can communicate with its slaves. I have also tried setting this ip 
> address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP, 
> but each time the master starts, it says that it is running at the external 
> IP address (as if it is ignoring the --ip or LIBPROCESS_IP options).
> When I start a slave, I tell it that the master is at an internal IP address 
> (no matter what the master says it's running at), so the initial connection 
> is successful. (I get messages output from both the slave and the master 
> saying the connection was successful.) However, after registering, the slave 
> *immediately* dies. My guess is that upon successful connection, the master 
> tells the slave to communicate with it on the external IP address, but since 
> the slave has no access to the Internet, any further communication fails. 
> The following is the error message the slave gives when it dies:
> F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect: 
> Network is unreachable [101]
> *** Check failure stack trace: ***
>     @     0x7f7d6be3342d  google::LogMessage::Fail()
>     @     0x7f7d6be36ae7  google::LogMessage::SendToLog()
>     @     0x7f7d6be36066  google::LogMessage::Flush()
>     @     0x7f7d6be36279  google::LogMessage::~LogMessage()
>     @     0x7f7d6be39351  google::ErrnoLogMessage::~ErrnoLogMessage()
>     @     0x7f7d6be47319  process::SocketManager::link()
>     @     0x7f7d6be4bc88  process::ProcessManager::link()
>     @     0x7f7d6be4ed98  process::ProcessBase::link()
>     @     0x7f7d6bcaf575  mesos::internal::slave::Slave::newMasterDetected()
>     @     0x7f7d6bcbbd7f  ProtobufProcess<>::handler1<>()
>     @     0x7f7d6bcbe477  ProtobufProcess<>::visit()
>     @     0x7f7d6be504e0  process::MessageEvent::visit()
>     @     0x7f7d6be4b448  process::ProcessManager::resume()
>     @     0x7f7d6be43bae  process::schedule()
>     @     0x7f7d6b5a77f1  start_thread
>     @     0x7f7d6a93c92d  clone
> Aborted
> I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp, 
> mesos-master.sh, etc.) and tried to determine why the ip option is getting 
> ignored, but I have thus far been unsuccessful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-165) Slaves die after initial registration with master with "Network is unreachable" error

Reply via email to