[ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5003:
---------------------------------

    Status: Patch Available  (was: Open)

Simply has the master retry writing the version file 3 times (by default - but 
configurable). If it fails, the master shuts down gracefully.

Please disregard the first patch - it accidentally includes the buggy 
hbase-site.xml I was using to reproduce this issue.
                
> If the master is started with a wrong root dir, it gets stuck and can't be 
> killed
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-5003
>                 URL: https://issues.apache.org/jira/browse/HBASE-5003
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.94.0, 0.90.7, 0.92.1
>
>         Attachments: hbase-5003-v2.patch, hbase-5003.patch
>
>
> Reported by a new user on IRC who tried to set hbase.rootdir to 
> file:///~/hbase, the master gets stuck and cannot be killed. I tried 
> something similar on my machine and it spins while logging:
> {quote}
> 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
> create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
> file:/bin/hbase
> 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
> create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
> file:/bin/hbase
> 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
> create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
> file:/bin/hbase
> {quote}
> The reason it cannot be stopped is that the master's main thread is stuck in 
> there and will never be notified:
> {quote}
> "Master:0;su-jdcryans-01.local,51116,1323475535684" prio=5 tid=7f92b7a3c000 
> nid=0x1137ba000 waiting on condition [1137b9000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
>       at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:113)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
>       at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
>       at 
> org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
>       at java.lang.Thread.run(Thread.java:680)
> {quote}
> It seems we should do a better handling of the exceptions we get in there, 
> and die if we need to. It would make a better user experience.
> Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to