[jira] [Commented] (HBASE-16056) Procedure v2 - fix master crash for FileNotFound

Matteo Bertozzi (JIRA) Fri, 17 Jun 2016 06:06:36 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336001#comment-15336001
 ]


Matteo Bertozzi commented on HBASE-16056:
-----------------------------------------

no, this is a race condition. you should not be stuck there forever. 
the other master should go down the after completing the operation,
or the one that is spinning will get aborted. we are spinning in a while 
(isRunning())

> Procedure v2 - fix master crash for FileNotFound
> ------------------------------------------------
>
>                 Key: HBASE-16056
>                 URL: https://issues.apache.org/jira/browse/HBASE-16056
>             Project: HBase
>          Issue Type: Sub-task
>          Components: proc-v2
>    Affects Versions: 2.0.0, 1.3.0, 1.2.1, 1.1.5
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>            Priority: Minor
>             Fix For: 2.0.0, 1.3.0, 1.2.2, 1.1.6
>
>         Attachments: HBASE-16056-v0.patch, HBASE-16056-v1.patch, 
> HBASE-16056-v2.patch
>
>
> [~syuanjiang] and [~tedyu] reported a backup master not able to start with 
> FileNotFound during proc-v2 lease recovery. (another restart should have 
> solved the problem)
> {noformat}
> FileNotFoundException: File does not exist: 
> /hbase/MasterProcWALs/state-000001.log
> namenode.INodeFile.valueOf(INodeFile.java:61) at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2877)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:753)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:671)
>  
> {noformat}
> this may happen when the other master is still active (e.g. GC) and tries to 
> remove files while the other master tries to become active. This operation is 
> retryable so the code should able to handle that.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16056) Procedure v2 - fix master crash for FileNotFound

Reply via email to