[
https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229491#comment-13229491
]
Jessica J edited comment on MESOS-165 at 3/14/12 6:55 PM:
----------------------------------------------------------
Well, it looked promising, but the problem is still there after applying the
patch and rebuilding. Is there somewhere else it could be getting overwritten?
Edit: Correction--the patch fixed the issue with --ip supplied to mesos-master.
It does not fix setting LIBPROCESS_IP in mesos-env.sh. Because command line
options are not passed to mesos-master when running with the deploy scripts
(start-mesos and start-masters), this means the only way to specify a master's
IP is to run mesos-master. (Second edit... the lack of command line options to
the deploy scripts should probably be considered a second bug, though not a
blocker in this case.)
was (Author: esohpromatem):
Well, it looked promising, but the problem is still there after applying
the patch and rebuilding. Is there somewhere else it could be getting
overwritten?
Edit: Correction--the patch fixed the issue with --ip supplied to mesos-master.
It does not fix setting LIBPROCESS_IP in mesos-env.sh. Because command line
options are not passed to mesos-master when running with the deploy scripts
(start-mesos and start-masters), this means the only way to specify a master's
IP is to run mesos-master.
> Slaves die after initial registration with master with "Network is
> unreachable" error
> -------------------------------------------------------------------------------------
>
> Key: MESOS-165
> URL: https://issues.apache.org/jira/browse/MESOS-165
> Project: Mesos
> Issue Type: Bug
> Components: master, slave
> Environment: Scientific Linux 6.2 internal cluster
> Reporter: Jessica J
> Assignee: Charles Reiss
> Priority: Blocker
>
> I am using a cluster in which only the master is externally accessible, so
> when I start the master, I set --ip to one of its internal IP addresses so
> that it can communicate with its slaves. I have also tried setting this ip
> address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP,
> but each time the master starts, it says that it is running at the external
> IP address (as if it is ignoring the --ip or LIBPROCESS_IP options).
> When I start a slave, I tell it that the master is at an internal IP address
> (no matter what the master says it's running at), so the initial connection
> is successful. (I get messages output from both the slave and the master
> saying the connection was successful.) However, after registering, the slave
> *immediately* dies. My guess is that upon successful connection, the master
> tells the slave to communicate with it on the external IP address, but since
> the slave has no access to the Internet, any further communication fails.
> The following is the error message the slave gives when it dies:
> F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect:
> Network is unreachable [101]
> *** Check failure stack trace: ***
> @ 0x7f7d6be3342d google::LogMessage::Fail()
> @ 0x7f7d6be36ae7 google::LogMessage::SendToLog()
> @ 0x7f7d6be36066 google::LogMessage::Flush()
> @ 0x7f7d6be36279 google::LogMessage::~LogMessage()
> @ 0x7f7d6be39351 google::ErrnoLogMessage::~ErrnoLogMessage()
> @ 0x7f7d6be47319 process::SocketManager::link()
> @ 0x7f7d6be4bc88 process::ProcessManager::link()
> @ 0x7f7d6be4ed98 process::ProcessBase::link()
> @ 0x7f7d6bcaf575 mesos::internal::slave::Slave::newMasterDetected()
> @ 0x7f7d6bcbbd7f ProtobufProcess<>::handler1<>()
> @ 0x7f7d6bcbe477 ProtobufProcess<>::visit()
> @ 0x7f7d6be504e0 process::MessageEvent::visit()
> @ 0x7f7d6be4b448 process::ProcessManager::resume()
> @ 0x7f7d6be43bae process::schedule()
> @ 0x7f7d6b5a77f1 start_thread
> @ 0x7f7d6a93c92d clone
> Aborted
> I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp,
> mesos-master.sh, etc.) and tried to determine why the ip option is getting
> ignored, but I have thus far been unsuccessful.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira