[ 
https://issues.apache.org/jira/browse/HBASE-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164138#comment-14164138
 ] 

Jonathan Hsieh commented on HBASE-12201:
----------------------------------------

Some more context/information:

This happens after the MR job and all its mapping and reducing has completed, 
and seem to be triggered by the cleanup() call when it tries to close the dir 
with the list of live files.  

At the worst i believe it the thrown exception exception blocked a the timely 
zk node removal but that all the main work of the sweeper job completes.

Does this sound right [~jingchengdu]?

Anyway, I've tested it on a yarn cluster now and have been able to reproduce 
and see that this patch fixes the problem.

> Close the writers in the MOB sweep tool
> ---------------------------------------
>
>                 Key: HBASE-12201
>                 URL: https://issues.apache.org/jira/browse/HBASE-12201
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: hbase-11339
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>            Priority: Minor
>             Fix For: hbase-11339
>
>         Attachments: HBASE-12201-V2.diff, HBASE-12201.diff
>
>
>  When running the sweep tool, we encountered such an exception.
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /hbase/mobdir/.tmp/mobcompaction/SweepJob-SweepMapper-SweepReducer-testSweepToolExpiredNoMinVersion-data/working/names/all
>  (inode 46500): File does not exist. Holder 
> DFSClient_NONMAPREDUCE_-1863270027_1 does not have any open files.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy15.complete(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:435)
>   at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:606)
>    at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>    at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>    at com.sun.proxy.$Proxy16.complete(Unknown Source)
>    at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2180)
>    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2164)
>    at 
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:908)
>   at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:925)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:861)
>   at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2687)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2704)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> This is because we have several writers opened by fs.create(path, true) are 
> not closed properly.
> Meanwhile, in the current implementation, we save the temp files under 
> sweepJobDir/working/..., and when we remove the directory of the sweep job 
> only the working is deleted. We should remove the whole sweepJobDir instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to