> I've checked the log of master, there is nothing strange in the log. And > there is no error log in master. > > We recently added code to master/slave to dump stack traces if they exit due to reception of a signal. I would recommend to pull that code in (on trunk) and report with any trace.
> I have another question about mesos failover. > If we use single master configuration, when the master crashes, I restart > it(same ip and port). But all the slaves can not reregister to the new > master. What is the purpose of this design? > > Currently, we only support automatic re-registration of slaves when both master(s) and slaves use zookeeper. > Thanks. > > > the last part of log is like this. > Production: [qa@hd1dz] ~/mesos-log$ tail > mesos-master.hd1dz.prod.mediav.com.root.log.INFO.20130510-142017.27386 > I0522 11:04:04.255683 27392 master.cpp:1498] Processing reply for offer > 201305101420-252063498-5050-27386-194813 on slave > 201305101420-252063498-5050-27386-5 (hd9dz.prod.mediav.com) for framework > 201305101420-252063498-5050-27386-0056 > I0522 11:04:04.255748 27393 hierarchical_allocator_process.hpp:497] > Framework 201305101420-252063498-5050-27386-0056 filtered slave > 201305101420-252063498-5050-27386-3 for 5.000000000000000secs > I0522 11:04:04.255822 27392 master.cpp:1498] Processing reply for offer > 201305101420-252063498-5050-27386-194814 on slave > 201305101420-252063498-5050-27386-1 (hd3dz.prod.mediav.com) for framework > 201305101420-252063498-5050-27386-0056 > I0522 11:04:04.255862 27393 hierarchical_allocator_process.hpp:497] > Framework 201305101420-252063498-5050-27386-0056 filtered slave > 201305101420-252063498-5050-27386-5 for 5.000000000000000secs > I0522 11:04:04.255936 27392 master.cpp:1498] Processing reply for offer > 201305101420-252063498-5050-27386-194815 on slave > 201305101420-252063498-5050-27386-2 (hd5dz.prod.mediav.com) for framework > 201305101420-252063498-5050-27386-0056 > I0522 11:04:04.255980 27393 hierarchical_allocator_process.hpp:497] > Framework 201305101420-252063498-5050-27386-0056 filtered slave > 201305101420-252063498-5050-27386-1 for 5.000000000000000secs > I0522 11:04:04.256060 27392 master.cpp:1498] Processing reply for offer > 201305101420-252063498-5050-27386-194816 on slave > 201305101420-252063498-5050-27386-0 (hd2dz.prod.mediav.com) for framework > 201305101420-252063498-5050-27386-0056 > I0522 11:04:04.256098 27393 hierarchical_allocator_process.hpp:497] > Framework 201305101420-252063498-5050-27386-0056 filtered slave > 201305101420-252063498-5050-27386-2 for 5.000000000000000secs > I0522 11:04:04.256216 27393 hierarchical_allocator_process.hpp:497] > Framework 201305101420-252063498-5050-27386-0056 filtered slave > 201305101420-252063498-5050-27386-0 for 5.000000000000000secs > W0522 11:04:07.555552 27394 master.cpp:82] No whitelist given. Advertising > offers for all slaves > Production: [qa@hd1dz] ~/mesos-log$ tail > mesos-master.hd1dz.prod.mediav.com.root.log.WARNING.20130510-142017.27386 > W0522 11:03:22.105432 27394 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:27.152434 27398 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:32.153389 27393 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:37.549747 27389 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:42.550670 27391 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:47.551592 27396 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:52.552641 27399 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:03:57.553750 27392 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:04:02.554628 27397 master.cpp:82] No whitelist given. Advertising > offers for all slaves > W0522 11:04:07.555552 27394 master.cpp:82] No whitelist given. Advertising > offers for all slaves > > > > > Guodong >
