[ 
https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229427#comment-13229427
 ] 

[email protected] commented on MESOS-165:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4355/
-----------------------------------------------------------

(Updated 2012-03-14 17:48:10.906132)


Review request for mesos, Benjamin Hindman and Jessica.


Changes
-------

Add note to description.


Summary (updated)
-------

libprocess currently binds to INADDR_ANY and uses the result of getsockname() 
as __ip__, overwriting its reading of LIBPROCESS_IP. This patch should use the 
environment variable setting (when it is not 0 == INADDR_ANY) when it is 
supplied instead of using getsockname().

I think this bug is the cause of MESOS-165.

Note: I assumed that we are intentionally binding to INADDR_ANY regardless of 
what LIBPROCESS_IP is set to, but this seems to indicate that our documentation 
on the meaning of the 'ip' option is wrong. An alternate fix would be to change 
process.cpp:1257 to use __ip__ for s_addr if it's not 0.


This addresses bug MESOS-165.
    https://issues.apache.org/jira/browse/MESOS-165


Diffs
-----

  third_party/libprocess/src/process.cpp 7433be8 

Diff: https://reviews.apache.org/r/4355/diff


Testing
-------


Thanks,

Charles


                
> Slaves die after initial registration with master with "Network is 
> unreachable" error
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-165
>                 URL: https://issues.apache.org/jira/browse/MESOS-165
>             Project: Mesos
>          Issue Type: Bug
>          Components: master, slave
>         Environment: Scientific Linux 6.2 internal cluster
>            Reporter: Jessica J
>            Assignee: Charles Reiss
>            Priority: Blocker
>
> I am using a cluster in which only the master is externally accessible, so 
> when I start the master, I set --ip to one of its internal IP addresses so 
> that it can communicate with its slaves. I have also tried setting this ip 
> address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP, 
> but each time the master starts, it says that it is running at the external 
> IP address (as if it is ignoring the --ip or LIBPROCESS_IP options).
> When I start a slave, I tell it that the master is at an internal IP address 
> (no matter what the master says it's running at), so the initial connection 
> is successful. (I get messages output from both the slave and the master 
> saying the connection was successful.) However, after registering, the slave 
> *immediately* dies. My guess is that upon successful connection, the master 
> tells the slave to communicate with it on the external IP address, but since 
> the slave has no access to the Internet, any further communication fails. 
> The following is the error message the slave gives when it dies:
> F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect: 
> Network is unreachable [101]
> *** Check failure stack trace: ***
>     @     0x7f7d6be3342d  google::LogMessage::Fail()
>     @     0x7f7d6be36ae7  google::LogMessage::SendToLog()
>     @     0x7f7d6be36066  google::LogMessage::Flush()
>     @     0x7f7d6be36279  google::LogMessage::~LogMessage()
>     @     0x7f7d6be39351  google::ErrnoLogMessage::~ErrnoLogMessage()
>     @     0x7f7d6be47319  process::SocketManager::link()
>     @     0x7f7d6be4bc88  process::ProcessManager::link()
>     @     0x7f7d6be4ed98  process::ProcessBase::link()
>     @     0x7f7d6bcaf575  mesos::internal::slave::Slave::newMasterDetected()
>     @     0x7f7d6bcbbd7f  ProtobufProcess<>::handler1<>()
>     @     0x7f7d6bcbe477  ProtobufProcess<>::visit()
>     @     0x7f7d6be504e0  process::MessageEvent::visit()
>     @     0x7f7d6be4b448  process::ProcessManager::resume()
>     @     0x7f7d6be43bae  process::schedule()
>     @     0x7f7d6b5a77f1  start_thread
>     @     0x7f7d6a93c92d  clone
> Aborted
> I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp, 
> mesos-master.sh, etc.) and tried to determine why the ip option is getting 
> ignored, but I have thus far been unsuccessful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to