Re: Hbase scripts problem

Michael Stack Mon, 27 Aug 2007 10:24:38 -0700

Michele Catasta wrote:

Hi,


we are having problems with hbase scripts. Basically, when we run the
stop script, it's not able to kill gracefully the HMaster (instead, it
forks another HMaster that after a few time dies).


Hello Michele:

If you trace, you will find that the stop-hbase.sh script invokesHMaster.main which launches a client (HBaseAdmin) to invoke the shutdownmethod on the actual cluster HMaster. I'm guessing HBaseAdmin is stuckunable to contact the remote HMaster perhaps because its trying toaccess the wrong address (Because HBaseAdmin is running insideHMaster.main, it looks like there are two HMasters's running when you doa process listing).

Check the logs to see if you can get a clue as to what is going on. Didthe cluster HMaster get the shutdown signal? (Is it running theshutdown sequence?) Logs are in $HADOOP_HOME/logs. Look at thehbase-USERID-master-*log content. Might help if you up the log level toDEBUG (add the line 'log4j.logger.org.apache.hadoop.hbase.HMaster=DEBUG'to $HADOOP_HOME/conf/log4j.properites). Stack traces are also usefulfiguring where the programs are hung (Send a 'kill -QUIT PROCESS_ID'.The output will appear in the '*.out' logs).

Make an issue and attach the logs if its not obvious to you whats goingon and we'll take a look.

The start script, on the other hand, is not able to start up the
HRegionserver, probably because the HMaster has been killed improperly
before. Taking a look at the logs, I found that he was complaining
because of an already existing directory inside HDFS (regionserver log

directory IIRC).

The outstanding log on improper shutdown should have been addressed byHADOOP-1527.

After I deleted it, it is dying for another reason
that I cannot understand:

2007-08-27 14:40:12,979 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 60010: starting
2007-08-27 14:40:12,979 INFO org.apache.hadoop.hbase.HRegionServer:
HRegionServer started at: 140.203.154.219:60010
2007-08-27 14:40:12,980 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 60010: starting
2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: closing leases
2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: leases closed
2007-08-27 14:40:12,984 INFO org.apache.hadoop.ipc.Server: Stopping
server on 60010
2007-08-27 14:40:12,985 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 60010: exiting

This is a snippet of the log file. Handlers are initialized and soon
after stopped. Maybe because I've used start-hbase.sh even if an
HMaster instance was already up?

Should still work. Logging should say why a region server has decidedto shutdown. Perhaps the reason is present if you set the level toDEBUG? (I'm guessing its because it can't find the master -- you haveset the hbase.master property in hbase-site.xml appropriate for yourcluster?). Add the following to your log4j.properties file:


log4j.logger.org.apache.hadoop.hbase.HStore=DEBUG
log4j.logger.org.apache.hadoop.hbase.HStoreFile=DEBUG
log4j.logger.org.apache.hadoop.hbase.HRegion=DEBUG
log4j.logger.org.apache.hadoop.hbase.HMemcache=DEBUG
log4j.logger.org.apache.hadoop.hbase.HRegionServer=DEBUG
log4j.logger.org.apache.hadoop.hbase.HLog=DEBUG

St.Ack

We would like to solve this situation in a graceful way, because the
last time we needed to erase all our hbase content.


Best Regards,
    -Michele Catasta

Re: Hbase scripts problem

Reply via email to