Hi guys, During our test about hadoop on mesos, we ran into a master crash issue this morning. We are using the code from the trunk, and I pull the code 2 weeks ago.
I've checked the log of master, there is nothing strange in the log. And there is no error log in master. I have another question about mesos failover. If we use single master configuration, when the master crashes, I restart it(same ip and port). But all the slaves can not reregister to the new master. What is the purpose of this design? Thanks. the last part of log is like this. Production: [qa@hd1dz] ~/mesos-log$ tail mesos-master.hd1dz.prod.mediav.com.root.log.INFO.20130510-142017.27386 I0522 11:04:04.255683 27392 master.cpp:1498] Processing reply for offer 201305101420-252063498-5050-27386-194813 on slave 201305101420-252063498-5050-27386-5 (hd9dz.prod.mediav.com) for framework 201305101420-252063498-5050-27386-0056 I0522 11:04:04.255748 27393 hierarchical_allocator_process.hpp:497] Framework 201305101420-252063498-5050-27386-0056 filtered slave 201305101420-252063498-5050-27386-3 for 5.000000000000000secs I0522 11:04:04.255822 27392 master.cpp:1498] Processing reply for offer 201305101420-252063498-5050-27386-194814 on slave 201305101420-252063498-5050-27386-1 (hd3dz.prod.mediav.com) for framework 201305101420-252063498-5050-27386-0056 I0522 11:04:04.255862 27393 hierarchical_allocator_process.hpp:497] Framework 201305101420-252063498-5050-27386-0056 filtered slave 201305101420-252063498-5050-27386-5 for 5.000000000000000secs I0522 11:04:04.255936 27392 master.cpp:1498] Processing reply for offer 201305101420-252063498-5050-27386-194815 on slave 201305101420-252063498-5050-27386-2 (hd5dz.prod.mediav.com) for framework 201305101420-252063498-5050-27386-0056 I0522 11:04:04.255980 27393 hierarchical_allocator_process.hpp:497] Framework 201305101420-252063498-5050-27386-0056 filtered slave 201305101420-252063498-5050-27386-1 for 5.000000000000000secs I0522 11:04:04.256060 27392 master.cpp:1498] Processing reply for offer 201305101420-252063498-5050-27386-194816 on slave 201305101420-252063498-5050-27386-0 (hd2dz.prod.mediav.com) for framework 201305101420-252063498-5050-27386-0056 I0522 11:04:04.256098 27393 hierarchical_allocator_process.hpp:497] Framework 201305101420-252063498-5050-27386-0056 filtered slave 201305101420-252063498-5050-27386-2 for 5.000000000000000secs I0522 11:04:04.256216 27393 hierarchical_allocator_process.hpp:497] Framework 201305101420-252063498-5050-27386-0056 filtered slave 201305101420-252063498-5050-27386-0 for 5.000000000000000secs W0522 11:04:07.555552 27394 master.cpp:82] No whitelist given. Advertising offers for all slaves Production: [qa@hd1dz] ~/mesos-log$ tail mesos-master.hd1dz.prod.mediav.com.root.log.WARNING.20130510-142017.27386 W0522 11:03:22.105432 27394 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:27.152434 27398 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:32.153389 27393 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:37.549747 27389 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:42.550670 27391 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:47.551592 27396 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:52.552641 27399 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:03:57.553750 27392 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:04:02.554628 27397 master.cpp:82] No whitelist given. Advertising offers for all slaves W0522 11:04:07.555552 27394 master.cpp:82] No whitelist given. Advertising offers for all slaves Guodong
