Samir Ahmic created HBASE-14729:
-----------------------------------

             Summary: SplitLogManager does not clean files from WALs folder in 
case of master failover
                 Key: HBASE-14729
                 URL: https://issues.apache.org/jira/browse/HBASE-14729
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 2.0.0
            Reporter: Samir Ahmic
            Assignee: Samir Ahmic


While i was testing master failover process on master branch (distributed 
cluster setup) i notice following:
1. List of dead regionservers was increasing every time active master was 
restarted.
2. Number of folders in /hbase/WALs folder was increasing every time active 
master was restarted

Here is exception from master logs showing why this is happening:
{code}
2015-10-30 09:41:49,238 INFO  [ProcedureExecutor-3] master.SplitLogManager: 
finished splitting (more than or equal to) 0 bytes in 0 log files in 
[hdfs://P3cluster/hbase/WALs/hnode1,16000,1446043659224-splitting] in 21ms
2015-10-30 09:41:49,235 WARN  [ProcedureExecutor-2] master.SplitLogManager: 
Returning success without actually splitting and deleting all the log files in 
path hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting: 
[FileStatus{path=hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting/hnode1%2C16000%2C1446046595488.meta.1446046691314.meta;
 isDirectory=false; length=39944; replication=3; blocksize=268435456; 
modification_time=1446050348104; access_time=1446046691317; owner=hbase; 
group=supergroup; permission=rw-r--r--; isSymlink=false}]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException):
 `/hbase/WALs/hnode1,16000,1446046595488-splitting is non empty': Directory is 
not empty
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3524)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3479)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3463)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:751)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:562)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1411)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy15.delete(Unknown Source)
{code}

I have tracked exception to this line in SplitLogManager#splitLogDistributed
{code}
297        if (fs.exists(logDir) && !fs.delete(logDir, false))
{code}

Since  we are removing folder we need to delete recursively so this line shoud 
be:
{code}
 297        if (fs.exists(logDir) && !fs.delete(logDir, true))
{code} 

This solved issue. I will attach patch after some additional testing.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to