Michele Catasta wrote:
Hi,

we are having problems with hbase scripts. Basically, when we run the
stop script, it's not able to kill gracefully the HMaster (instead, it
forks another HMaster that after a few time dies).

Hello Michele:

If you trace, you will find that the stop-hbase.sh script invokes HMaster.main which launches a client (HBaseAdmin) to invoke the shutdown method on the actual cluster HMaster. I'm guessing HBaseAdmin is stuck unable to contact the remote HMaster perhaps because its trying to access the wrong address (Because HBaseAdmin is running inside HMaster.main, it looks like there are two HMasters's running when you do a process listing).

Check the logs to see if you can get a clue as to what is going on. Did the cluster HMaster get the shutdown signal? (Is it running the shutdown sequence?) Logs are in $HADOOP_HOME/logs. Look at the hbase-USERID-master-*log content. Might help if you up the log level to DEBUG (add the line 'log4j.logger.org.apache.hadoop.hbase.HMaster=DEBUG' to $HADOOP_HOME/conf/log4j.properites). Stack traces are also useful figuring where the programs are hung (Send a 'kill -QUIT PROCESS_ID'. The output will appear in the '*.out' logs).

Make an issue and attach the logs if its not obvious to you whats going on and we'll take a look.

The start script, on the other hand, is not able to start up the
HRegionserver, probably because the HMaster has been killed improperly
before. Taking a look at the logs, I found that he was complaining
because of an already existing directory inside HDFS (regionserver log
directory IIRC).
The outstanding log on improper shutdown should have been addressed by HADOOP-1527.

After I deleted it, it is dying for another reason
that I cannot understand:

2007-08-27 14:40:12,979 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 60010: starting
2007-08-27 14:40:12,979 INFO org.apache.hadoop.hbase.HRegionServer:
HRegionServer started at: 140.203.154.219:60010
2007-08-27 14:40:12,980 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 60010: starting
2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: closing leases
2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: leases closed
2007-08-27 14:40:12,984 INFO org.apache.hadoop.ipc.Server: Stopping
server on 60010
2007-08-27 14:40:12,985 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 60010: exiting

This is a snippet of the log file. Handlers are initialized and soon
after stopped. Maybe because I've used start-hbase.sh even if an
HMaster instance was already up?

Should still work. Logging should say why a region server has decided to shutdown. Perhaps the reason is present if you set the level to DEBUG? (I'm guessing its because it can't find the master -- you have set the hbase.master property in hbase-site.xml appropriate for your cluster?). Add the following to your log4j.properties file:

log4j.logger.org.apache.hadoop.hbase.HStore=DEBUG
log4j.logger.org.apache.hadoop.hbase.HStoreFile=DEBUG
log4j.logger.org.apache.hadoop.hbase.HRegion=DEBUG
log4j.logger.org.apache.hadoop.hbase.HMemcache=DEBUG
log4j.logger.org.apache.hadoop.hbase.HRegionServer=DEBUG
log4j.logger.org.apache.hadoop.hbase.HLog=DEBUG

St.Ack


We would like to solve this situation in a graceful way, because the
last time we needed to erase all our hbase content.


Best Regards,
    -Michele Catasta

Reply via email to