Hi Guodong, This is a known issue https://issues.apache.org/jira/browse/MESOS-300 , though we were unable to find the root cause so far. If you are able to consistently reproduce this behavior, please detail the steps to reproduce this on that ticket. That would help us diagnose/fix this.
Thanks, On Mon, Apr 15, 2013 at 6:39 PM, 王国栋 <[email protected]> wrote: > hi, > > When I start 2 slaves to register to the master one by one, and then > refresh the master http monitoring page, the master crashes. I restart the > master(without restart slave), and it crashes again when I refresh the > page( > http://localhost:5050). > The log of the master is as follow. > > I0416 09:36:35.325883 4910 main.cpp:116] Build: 2013-04-15 10:31:38 by > guodong > I0416 09:36:35.326510 4910 main.cpp:117] Starting Mesos master > I0416 09:36:35.326918 4927 master.cpp:309] Master started on > 127.0.1.1:5050 > I0416 09:36:35.327046 4927 master.cpp:324] Master ID: > 201304160936-16842879-5050-4910 > W0416 09:36:35.327277 4924 master.cpp:81] No whitelist given. Advertising > offers for all slaves > I0416 09:36:35.329146 4927 master.cpp:603] Elected as master! > I0416 09:36:35.811032 4925 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:35.811139 4925 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:35.811259 4925 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-0 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:35.811786 4925 master.cpp:537] Slave > 201304160936-16842879-5050-4910-0(guodong-OptiPlex-990) disconnected > I0416 09:36:35.811841 4925 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-0(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:35.811887 4924 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-0 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:35.812057 4924 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-0 > I0416 09:36:36.812571 4925 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:36.812705 4925 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:36.812855 4925 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-1 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:36.813194 4925 master.cpp:537] Slave > 201304160936-16842879-5050-4910-1(guodong-OptiPlex-990) disconnected > I0416 09:36:36.813256 4925 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-1(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:36.813294 4926 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-1 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:36.813433 4926 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-1 > I0416 09:36:37.814275 4925 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:37.814412 4925 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:37.814467 4925 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-2 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:37.814831 4925 master.cpp:537] Slave > 201304160936-16842879-5050-4910-2(guodong-OptiPlex-990) disconnected > I0416 09:36:37.814882 4925 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-2(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:37.814900 4924 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-2 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:37.815040 4924 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-2 > I0416 09:36:38.815996 4925 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:38.816112 4925 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:38.816213 4925 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-3 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:38.816763 4925 master.cpp:537] Slave > 201304160936-16842879-5050-4910-3(guodong-OptiPlex-990) disconnected > I0416 09:36:38.816838 4924 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-3 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:38.816999 4925 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-3(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:38.817443 4927 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-3 > I0416 09:36:39.817735 4925 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:39.818085 4925 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:39.818320 4925 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-4 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:39.818814 4925 master.cpp:537] Slave > 201304160936-16842879-5050-4910-4(guodong-OptiPlex-990) disconnected > I0416 09:36:39.818878 4927 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-4 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:39.818980 4925 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-4(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:39.819612 4924 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-4 > W0416 09:36:40.328641 4925 master.cpp:81] No whitelist given. Advertising > offers for all slaves > I0416 09:36:40.819702 4926 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:40.820044 4926 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:40.820314 4926 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-5 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:40.820997 4926 master.cpp:537] Slave > 201304160936-16842879-5050-4910-5(guodong-OptiPlex-990) disconnected > I0416 09:36:40.821081 4924 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-5 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:40.821246 4926 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-5(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:40.821862 4927 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-5 > I0416 09:36:41.821005 4926 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:41.821271 4926 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:41.821451 4926 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-6 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:41.821929 4926 master.cpp:537] Slave > 201304160936-16842879-5050-4910-6(guodong-OptiPlex-990) disconnected > I0416 09:36:41.822000 4924 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-6 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:41.822161 4926 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-6(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:41.822906 4925 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-6 > I0416 09:36:42.822779 4927 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:42.823669 4927 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:42.824539 4927 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-7 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:42.892474 4927 master.cpp:537] Slave > 201304160936-16842879-5050-4910-7(guodong-OptiPlex-990) disconnected > I0416 09:36:42.908231 4927 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-7(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:42.892504 4925 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-7 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:42.908664 4925 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-7 > I0416 09:36:43.824604 4926 master.cpp:968] Attempting to register slave on > guodong-OptiPlex-990 at slave(1)@127.0.1.1:46056 > I0416 09:36:43.825044 4926 master.cpp:1224] Master now considering a slave > at guodong-OptiPlex-990:46056 as active > I0416 09:36:43.825316 4926 master.cpp:1862] Adding slave > 201304160936-16842879-5050-4910-8 at guodong-OptiPlex-990 with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 > I0416 09:36:43.825809 4926 master.cpp:537] Slave > 201304160936-16842879-5050-4910-8(guodong-OptiPlex-990) disconnected > I0416 09:36:43.825865 4927 hierarchical_allocator_process.hpp:395] Added > slave 201304160936-16842879-5050-4910-8 (guodong-OptiPlex-990) with cpus=4; > mem=10897; ports=[31000-32000]; disk=399611 (and cpus=4; mem=10897; > ports=[31000-32000]; disk=399611 available) > I0416 09:36:43.825969 4926 master.cpp:542] Removing disconnected slave > 201304160936-16842879-5050-4910-8(guodong-OptiPlex-990) because it is not > checkpointing! > I0416 09:36:43.826431 4924 hierarchical_allocator_process.hpp:423] Removed > slave 201304160936-16842879-5050-4910-8 > F0416 09:36:44.198364 4928 process.cpp:1967] Check failed: > outgoing.count(s) > 0 > *** Check failure stack trace: *** > @ 0x7f71eabed14d google::LogMessage::Fail() > @ 0x7f71eabf09df google::LogMessage::SendToLog() > @ 0x7f71eabeff17 google::LogMessage::Flush() > @ 0x7f71eabf0ebd google::LogMessageFatal::~LogMessageFatal() > @ 0x7f71eab03207 process::SocketManager::next() > @ 0x7f71eab0765b process::send_data() > @ 0x7f71eac2fdb1 ev_invoke_pending > @ 0x7f71eac348fd ev_loop > @ 0x7f71eaafc70b process::serve() > @ 0x7f71e8f0fe9a start_thread > @ 0x7f71e8c3ccbd (unknown) > Aborted (core dumped) > > > Guodong >
