Hi Michael,
thanks for the detailed answer, it has been helpful (especially the
log4j DEBUG level for all that classes).
> Check the logs to see if you can get a clue as to what is going on. Did
> the cluster HMaster get the shutdown signal? (Is it running the
> shutdown sequence?) Logs are in $HADOOP_HOME/logs. Look at the
> hbase-USERID-master-*log content. Might help if you up the log level to
> DEBUG (add the line 'log4j.logger.org.apache.hadoop.hbase.HMaster=DEBUG'
> to $HADOOP_HOME/conf/log4j.properites). Stack traces are also useful
> figuring where the programs are hung (Send a 'kill -QUIT PROCESS_ID'.
> The output will appear in the '*.out' logs).
I've been able to reproduce the shutdown process. Basically, we were
deploying our hadoop+hbase installation using a little script to
automatize the boring task.
The problem is that we called stop-hbase.sh and soon after stop-all.sh
for hadoop platform.
Considering that the stop-hbase.sh returns immediately after it has
been launched, and instead hbase takes a while to properly shutdown...
we basically killed hadoop (and all its RPC facilities that are used
by hbase) while hbase was shutting down.
Maybe I didn't get the whole picture correctly, but I've been able to
solve the problem with a 'wait until hbase shuts down' in the script.
> The outstanding log on improper shutdown should have been addressed by
> HADOOP-1527.
2007-08-28 04:33:01,316 ERROR org.apache.hadoop.hbase.HRegionServer:
Can not start region server because
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.SafeModeException: Cannot create directory
/*******/****************/hbase/log_xxx.xxx.xxx.xxx_60010. Name node
is in safe mode.
Safe mode will be turned off automatically.
Even if I shutdown properly, I can reproduce this problem every time I
try to restart hbase. The first time I restart it, the HRegionServer
finds its log directory in 'safe mode'. After it triggered the name
node to turn off the safe mode, I can restart hbase without problems.
Is it in someway related to HADOOP-1527? Probably, before the
stutdown, setSafeMode() method is called on log directory. I tried
also to wait a good amount of time, but there are no TTLs if I'm not
wrong (and I didn't find any of them in the sources).
I forgot to say (also in the othere mail) that we are running hadoop
and hbase nigthly, updating constantly to the last successful nightly
build.
Best Regards,
-Michele Catasta