[ https://issues.apache.org/jira/browse/HBASE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434943#comment-16434943 ]
Hudson commented on HBASE-20330: -------------------------------- Results for branch branch-2.0 [build #161 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/161/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/161//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/161//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/161//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > ProcedureExecutor.start() gets stuck in recover lease on store. > --------------------------------------------------------------- > > Key: HBASE-20330 > URL: https://issues.apache.org/jira/browse/HBASE-20330 > Project: HBase > Issue Type: Bug > Components: proc-v2 > Affects Versions: 2.0.0-beta-2 > Reporter: Umesh Agashe > Assignee: Umesh Agashe > Priority: Major > Fix For: 2.0.0 > > Attachments: hbase-20330.master.001.patch, > hbase-20330.master.002.patch, hbase-20330.master.003.patch, > hbase-20330.master.004.patch, hbase-20330.master.005.patch > > > We have instance in our internal testing where master log is getting filled > with following messages: > {code} > 2018-04-02 17:11:17,566 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > Recover lease on dfs file > hdfs://ns1/hbase/MasterProcWALs/pv2-00000000000000000018.log > 2018-04-02 17:11:17,567 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > Recovered lease, attempt=0 on > file=hdfs://ns1/hbase/MasterProcWALs/pv2-00000000000000000018.log after 1ms > 2018-04-02 17:11:17,574 WARN > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Unable to > read tracker for hdfs://ns1/hbase/MasterProcWALs/pv2-00000000000000000018.log > - Invalid Trailer version. got 111 expected 1 > 2018-04-02 17:11:17,576 ERROR > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Log file with > id=19 already exists > org.apache.hadoop.fs.FileAlreadyExistsException: > /hbase/MasterProcWALs/pv2-00000000000000000019.log for client 10.17.202.11 > already exists > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:381) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2442) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > {code} > Debugging it further with [~appy], [~avirmani] and [~xiaochen] we found that > when WALProcedureStore#rollWriter() fails and returns false for some reason, > it keeps looping continuously. -- This message was sent by Atlassian JIRA (v7.6.3#76005)