[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaneal Manek updated HBASE-5003: --------------------------------- Status: Patch Available (was: Open) Simply has the master retry writing the version file 3 times (by default - but configurable). If it fails, the master shuts down gracefully. Please disregard the first patch - it accidentally includes the buggy hbase-site.xml I was using to reproduce this issue. > If the master is started with a wrong root dir, it gets stuck and can't be > killed > --------------------------------------------------------------------------------- > > Key: HBASE-5003 > URL: https://issues.apache.org/jira/browse/HBASE-5003 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.4 > Reporter: Jean-Daniel Cryans > Priority: Critical > Labels: noob > Fix For: 0.94.0, 0.90.7, 0.92.1 > > Attachments: hbase-5003-v2.patch, hbase-5003.patch > > > Reported by a new user on IRC who tried to set hbase.rootdir to > file:///~/hbase, the master gets stuck and cannot be killed. I tried > something similar on my machine and it spins while logging: > {quote} > 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to > create version file at file:/bin/hbase, retrying: Mkdirs failed to create > file:/bin/hbase > 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to > create version file at file:/bin/hbase, retrying: Mkdirs failed to create > file:/bin/hbase > 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to > create version file at file:/bin/hbase, retrying: Mkdirs failed to create > file:/bin/hbase > {quote} > The reason it cannot be stopped is that the master's main thread is stuck in > there and will never be notified: > {quote} > "Master:0;su-jdcryans-01.local,51116,1323475535684" prio=5 tid=7f92b7a3c000 > nid=0x1137ba000 waiting on condition [1137b9000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) > at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) > at > org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:113) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) > at > org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) > at java.lang.Thread.run(Thread.java:680) > {quote} > It seems we should do a better handling of the exceptions we get in there, > and die if we need to. It would make a better user experience. > Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira