Hi Joachim, The problem here is that the slave is reporting IP address 127.0.1.1 for itself. The master tries to open a reverse connection to the slave (to [email protected]:42553) and fails, so it thinks that the slave has died. You can fix this by either configuring /etc/hosts on the slave node to lists its external IP for its hostname rather than 127.0.1.1, or passing the --ip argument to mesos-slave.
Matei On Jun 9, 2011, at 12:39 AM, Joachim Karnbach-Mink wrote: > Hi all, > > I currently setup my first Mesos cluster on Ubuntu Lucid 64Bit and run in > some trouble. The master starts without any problems: > > ./bin/mesos-master --ip=192.168.1.163 > I0609 09:30:26.087971 28453 logging.cpp:40] Logging to > /home/jkm/mesos-mesos-81c4e62/logs > I0609 09:30:26.088764 28453 main.cpp:66] Build: 2011-06-08 11:05:22 by jkm > I0609 09:30:26.088783 28453 main.cpp:67] Starting Mesos master > I0609 09:30:26.090169 28453 webui.cpp:63] Starting master web UI on port > 8080 > I0609 09:30:26.090363 28456 webui.cpp:31] Web UI thread started > I0609 09:30:26.099850 28454 master.cpp:258] Master started at mesos:// > [email protected]:5050 > I0609 09:30:26.100067 28454 master.cpp:268] Master ID: 201106090930-0 > I0609 09:30:26.100087 28454 master.cpp:1124] Creating "simple" allocator > I0609 09:30:26.100646 28454 master.cpp:286] New master detected ... maybe > it's us! > I0609 09:30:26.109102 28456 webui.cpp:43] Loading webui/master/webui.py > Bottle server starting up (using WSGIRefServer())... > Listening on http://0.0.0.0:8080/ > > But if I want to connect a slave via the network I got Process exited after > registering. > > On the slave: > ./bin/mesos-slave --url=mesos://[email protected]:5050 > I0609 09:32:27.657230 8397 logging.cpp:40] Logging to > /home/jkm/mesos-mesos-81c4e62/logs > I0609 09:32:27.657889 8397 main.cpp:66] Creating "process" isolation module > I0609 09:32:27.657948 8397 main.cpp:74] Build: 2011-06-08 11:34:46 by jkm > I0609 09:32:27.657965 8397 main.cpp:75] Starting Mesos slave > I0609 09:32:27.660188 8397 webui.cpp:72] Starting slave web UI on port 8081 > I0609 09:32:27.660663 8398 slave.cpp:149] Slave started at > [email protected]:42553 > I0609 09:32:27.661268 8398 slave.cpp:175] New master at > [email protected]:5050 with ID:0 > I0609 09:32:27.661519 8400 webui.cpp:32] Web UI thread started > I0609 09:32:27.679116 8400 webui.cpp:45] Loading webui/slave/webui.py > Bottle server starting up (using WSGIRefServer())... > Listening on http://0.0.0.0:8081/ > > On the master: > I0609 09:32:27.666820 28454 master.cpp:481] Registering slave > 201106090930-0-0 at [email protected]:42553 > I0609 09:32:27.667151 28454 simple_allocator.cpp:36] Added slave > 201106090930-0-0 > I0609 09:32:27.667266 28454 master.cpp:722] Process exited: > [email protected]:42553 > I0609 09:32:27.667290 28454 master.cpp:734] slave 201106090930-0-0 > disconnected > I0609 09:32:27.667328 28454 simple_allocator.cpp:45] Removed slave > 201106090930-0-0 > > If I start a slave session on the master this works fine: > On the master: > I0609 09:35:12.966112 28454 master.cpp:481] Registering slave > 201106090930-0-1 at [email protected]:58401 > I0609 09:35:12.966351 28454 simple_allocator.cpp:36] Added slave > 201106090930-0-1 > > Anybody an idea where I can have a look at? In the log files there are no > additional informations. I tried this with different Mesos versions but all > have the same behavior. > > Thanks a lot, > Joachim
