[
https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229427#comment-13229427
]
[email protected] commented on MESOS-165:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4355/
-----------------------------------------------------------
(Updated 2012-03-14 17:48:10.906132)
Review request for mesos, Benjamin Hindman and Jessica.
Changes
-------
Add note to description.
Summary (updated)
-------
libprocess currently binds to INADDR_ANY and uses the result of getsockname()
as __ip__, overwriting its reading of LIBPROCESS_IP. This patch should use the
environment variable setting (when it is not 0 == INADDR_ANY) when it is
supplied instead of using getsockname().
I think this bug is the cause of MESOS-165.
Note: I assumed that we are intentionally binding to INADDR_ANY regardless of
what LIBPROCESS_IP is set to, but this seems to indicate that our documentation
on the meaning of the 'ip' option is wrong. An alternate fix would be to change
process.cpp:1257 to use __ip__ for s_addr if it's not 0.
This addresses bug MESOS-165.
https://issues.apache.org/jira/browse/MESOS-165
Diffs
-----
third_party/libprocess/src/process.cpp 7433be8
Diff: https://reviews.apache.org/r/4355/diff
Testing
-------
Thanks,
Charles
> Slaves die after initial registration with master with "Network is
> unreachable" error
> -------------------------------------------------------------------------------------
>
> Key: MESOS-165
> URL: https://issues.apache.org/jira/browse/MESOS-165
> Project: Mesos
> Issue Type: Bug
> Components: master, slave
> Environment: Scientific Linux 6.2 internal cluster
> Reporter: Jessica J
> Assignee: Charles Reiss
> Priority: Blocker
>
> I am using a cluster in which only the master is externally accessible, so
> when I start the master, I set --ip to one of its internal IP addresses so
> that it can communicate with its slaves. I have also tried setting this ip
> address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP,
> but each time the master starts, it says that it is running at the external
> IP address (as if it is ignoring the --ip or LIBPROCESS_IP options).
> When I start a slave, I tell it that the master is at an internal IP address
> (no matter what the master says it's running at), so the initial connection
> is successful. (I get messages output from both the slave and the master
> saying the connection was successful.) However, after registering, the slave
> *immediately* dies. My guess is that upon successful connection, the master
> tells the slave to communicate with it on the external IP address, but since
> the slave has no access to the Internet, any further communication fails.
> The following is the error message the slave gives when it dies:
> F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect:
> Network is unreachable [101]
> *** Check failure stack trace: ***
> @ 0x7f7d6be3342d google::LogMessage::Fail()
> @ 0x7f7d6be36ae7 google::LogMessage::SendToLog()
> @ 0x7f7d6be36066 google::LogMessage::Flush()
> @ 0x7f7d6be36279 google::LogMessage::~LogMessage()
> @ 0x7f7d6be39351 google::ErrnoLogMessage::~ErrnoLogMessage()
> @ 0x7f7d6be47319 process::SocketManager::link()
> @ 0x7f7d6be4bc88 process::ProcessManager::link()
> @ 0x7f7d6be4ed98 process::ProcessBase::link()
> @ 0x7f7d6bcaf575 mesos::internal::slave::Slave::newMasterDetected()
> @ 0x7f7d6bcbbd7f ProtobufProcess<>::handler1<>()
> @ 0x7f7d6bcbe477 ProtobufProcess<>::visit()
> @ 0x7f7d6be504e0 process::MessageEvent::visit()
> @ 0x7f7d6be4b448 process::ProcessManager::resume()
> @ 0x7f7d6be43bae process::schedule()
> @ 0x7f7d6b5a77f1 start_thread
> @ 0x7f7d6a93c92d clone
> Aborted
> I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp,
> mesos-master.sh, etc.) and tried to determine why the ip option is getting
> ignored, but I have thus far been unsuccessful.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira