Hmm, nothing jumps out from the logs as suspicious. But the 75s from slave registration to its eventual removal corresponds with the health check timeout (15s * 5 retries). I would suggest running 'top' on the master and slave machines to see if either of them are loaded.
@vinodkone On Mon, Oct 1, 2012 at 8:30 PM, Scott Wang < [email protected]> wrote: > Vinod, > > For sure they can connect with each other. I can see the slave on the > webUI for about a minute or so then it just dropped out of cluster. > Enclosed are the slave output and master output in that one minute. > Maybe you can spot something that I am not doing right. > > Thanks, > Scott > > On Mon, Oct 1, 2012 at 7:49 PM, Vinod Kone <[email protected]> wrote: > > Looks like the slave is not responding to health checks from the master. > Is > > the network connection from master-->slave alright? is the machine > hosting > > slave is cpu starved? Those are some of the things, I would check for. > > > > @vinodkone > > > > > > On Mon, Oct 1, 2012 at 6:04 PM, Scott Wang < > > [email protected]> wrote: > > > >> I am trying to setup a small cluster a master and a slave but I am > >> getting the following output and the slave just terminated. > >> > >> ------------------------------------------------Slave > >> > >> > output----------------------------------------------------------------------- > >> I1002 01:00:02.868795 27122 slave.cpp:1160] Current disk usage 2.09%. > >> Max allowed age: 6.85days > >> I1002 01:00:17.974647 27123 slave.cpp:335] Slave asked to shut down > >> I1002 01:00:17.974792 27123 slave.cpp:313] Slave terminating > >> > >> > >> -----------------------------------------------Master > >> > >> > output---------------------------------------------------------------------- > >> W1002 01:00:17.967651 11433 master.cpp:1173] Removing slave > >> 201210020057-1994437898-5050-11419-1 at itvm638:34013 because it has > >> been deactivated > >> I1002 01:00:17.968174 11433 master.cpp:1182] Master now considering a > >> slave at itvm638:34013 as inactive > >> I1002 01:00:17.968328 11435 hierarchical_allocator_process.hpp:371] > >> Removed slave 201210020057-1994437898-5050-11419-1 > >> > >> Does anyone have any idea what I should do to prevent the slave going > >> down by itself. > >> > >> Thanks, > >> Scott > >> >
