[jira] [Commented] (HBASE-14729) SplitLogManager does not clean files from WALs folder in case of master failover

Hadoop QA (JIRA) Fri, 30 Oct 2015 09:48:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982847#comment-14982847
 ]


Hadoop QA commented on HBASE-14729:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769781/HBASE-14729.patch
  against master branch at commit 23fa18184cb68ca05246beb2189f8801200bdd7c.
  ATTACHMENT ID: 12769781

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16308//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16308//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16308//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16308//console

This message is automatically generated.

> SplitLogManager does not clean files from WALs folder in case of master 
> failover
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-14729
>                 URL: https://issues.apache.org/jira/browse/HBASE-14729
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0
>            Reporter: Samir Ahmic
>            Assignee: Samir Ahmic
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14729.patch
>
>
> While i was testing master failover process on master branch (distributed 
> cluster setup) i notice following:
> 1. List of dead regionservers was increasing every time active master was 
> restarted.
> 2. Number of folders in /hbase/WALs folder was increasing every time active 
> master was restarted
> Here is exception from master logs showing why this is happening:
> {code}
> 2015-10-30 09:41:49,238 INFO  [ProcedureExecutor-3] master.SplitLogManager: 
> finished splitting (more than or equal to) 0 bytes in 0 log files in 
> [hdfs://P3cluster/hbase/WALs/hnode1,16000,1446043659224-splitting] in 21ms
> 2015-10-30 09:41:49,235 WARN  [ProcedureExecutor-2] master.SplitLogManager: 
> Returning success without actually splitting and deleting all the log files 
> in path hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting: 
> [FileStatus{path=hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting/hnode1%2C16000%2C1446046595488.meta.1446046691314.meta;
>  isDirectory=false; length=39944; replication=3; blocksize=268435456; 
> modification_time=1446050348104; access_time=1446046691317; owner=hbase; 
> group=supergroup; permission=rw-r--r--; isSymlink=false}]
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException):
>  `/hbase/WALs/hnode1,16000,1446046595488-splitting is non empty': Directory 
> is not empty
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3524)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3479)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3463)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:751)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:562)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>       at com.sun.proxy.$Proxy15.delete(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:490)
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy16.delete(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
>       at com.sun.proxy.$Proxy17.delete(Unknown Source)
>       at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
>       at com.sun.proxy.$Proxy17.delete(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1726)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:588)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:584)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:584)
>       at 
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:297)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:400)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:373)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:295)
>       at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:388)
>       at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:228)
>       at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:72)
>       at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
>       at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1050)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:841)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:794)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
>       at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:479)
> {code}
> I have tracked exception to this line in SplitLogManager#splitLogDistributed
> {code}
> 297        if (fs.exists(logDir) && !fs.delete(logDir, false))
> {code}
> Since  we are removing folder we need to delete recursively so this line 
> shoud be:
> {code}
>  297        if (fs.exists(logDir) && !fs.delete(logDir, true))
> {code} 
> This solved issue. I will attach patch after some additional testing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14729) SplitLogManager does not clean files from WALs folder in case of master failover

Reply via email to