Hi Joachim,

The problem here is that the slave is reporting IP address 127.0.1.1 for 
itself. The master tries to open a reverse connection to the slave (to 
[email protected]:42553) and fails, so it thinks that the slave has died. You can fix 
this by either configuring /etc/hosts on the slave node to lists its external 
IP for its hostname rather than 127.0.1.1, or passing the --ip argument to 
mesos-slave.

Matei

On Jun 9, 2011, at 12:39 AM, Joachim Karnbach-Mink wrote:

> Hi all,
> 
> I currently setup my first Mesos cluster on Ubuntu Lucid 64Bit and run in
> some trouble. The master starts without any problems:
> 
> ./bin/mesos-master --ip=192.168.1.163
> I0609 09:30:26.087971 28453 logging.cpp:40] Logging to
> /home/jkm/mesos-mesos-81c4e62/logs
> I0609 09:30:26.088764 28453 main.cpp:66] Build: 2011-06-08 11:05:22 by jkm
> I0609 09:30:26.088783 28453 main.cpp:67] Starting Mesos master
> I0609 09:30:26.090169 28453 webui.cpp:63] Starting master web UI on port
> 8080
> I0609 09:30:26.090363 28456 webui.cpp:31] Web UI thread started
> I0609 09:30:26.099850 28454 master.cpp:258] Master started at mesos://
> [email protected]:5050
> I0609 09:30:26.100067 28454 master.cpp:268] Master ID: 201106090930-0
> I0609 09:30:26.100087 28454 master.cpp:1124] Creating "simple" allocator
> I0609 09:30:26.100646 28454 master.cpp:286] New master detected ... maybe
> it's us!
> I0609 09:30:26.109102 28456 webui.cpp:43] Loading webui/master/webui.py
> Bottle server starting up (using WSGIRefServer())...
> Listening on http://0.0.0.0:8080/
> 
> But if I want to connect a slave via the network I got Process exited after
> registering.
> 
> On the slave:
> ./bin/mesos-slave --url=mesos://[email protected]:5050
> I0609 09:32:27.657230  8397 logging.cpp:40] Logging to
> /home/jkm/mesos-mesos-81c4e62/logs
> I0609 09:32:27.657889  8397 main.cpp:66] Creating "process" isolation module
> I0609 09:32:27.657948  8397 main.cpp:74] Build: 2011-06-08 11:34:46 by jkm
> I0609 09:32:27.657965  8397 main.cpp:75] Starting Mesos slave
> I0609 09:32:27.660188  8397 webui.cpp:72] Starting slave web UI on port 8081
> I0609 09:32:27.660663  8398 slave.cpp:149] Slave started at
> [email protected]:42553
> I0609 09:32:27.661268  8398 slave.cpp:175] New master at
> [email protected]:5050 with ID:0
> I0609 09:32:27.661519  8400 webui.cpp:32] Web UI thread started
> I0609 09:32:27.679116  8400 webui.cpp:45] Loading webui/slave/webui.py
> Bottle server starting up (using WSGIRefServer())...
> Listening on http://0.0.0.0:8081/
> 
> On the master:
> I0609 09:32:27.666820 28454 master.cpp:481] Registering slave
> 201106090930-0-0 at [email protected]:42553
> I0609 09:32:27.667151 28454 simple_allocator.cpp:36] Added slave
> 201106090930-0-0
> I0609 09:32:27.667266 28454 master.cpp:722] Process exited:
> [email protected]:42553
> I0609 09:32:27.667290 28454 master.cpp:734] slave 201106090930-0-0
> disconnected
> I0609 09:32:27.667328 28454 simple_allocator.cpp:45] Removed slave
> 201106090930-0-0
> 
> If I start a slave session on the master this works fine:
> On the master:
> I0609 09:35:12.966112 28454 master.cpp:481] Registering slave
> 201106090930-0-1 at [email protected]:58401
> I0609 09:35:12.966351 28454 simple_allocator.cpp:36] Added slave
> 201106090930-0-1
> 
> Anybody an idea where I can have a look at? In the log files there are no
> additional informations. I tried this with different Mesos versions but all
> have the same behavior.
> 
> Thanks a lot,
> Joachim

Reply via email to