[
https://issues.apache.org/jira/browse/HBASE-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164573#comment-14164573
]
Jingcheng Du commented on HBASE-12201:
--------------------------------------
Thanks Jon, [~jmhsieh].
bq. changes the compaction working dir so that multiple jobs or subsequent jobs
don't use the same path.
Previously we have the working dir like mobCompactionDirOfJobName/working/...
(the job name is mapperclass-reducerclass-table-cf), when we remove the working
dir, only working is deleted, the mobCompactionDirOfJobName will be left there.
It won't impact the logic, but it's better to remove it.
bq. Can you give a quick explanation of why or how we would potentially open a
file for append? (I believe there is a obscure race possible there, – two get
to !exists and then both trry to create) but let's ignore that for now).
We have lock that surrounds the logic, right? There's only one sweeper running
at the same time for the same table and cf, so there's not race condition here.
Actually we don't need to check the existence here (1, the whole working dir is
deleted/created in the beginning of the sweeper. 2, the file name is a UUID),
we could create it directly here. Will provide a new patch(V3) to fix this.
> Close the writers in the MOB sweep tool
> ---------------------------------------
>
> Key: HBASE-12201
> URL: https://issues.apache.org/jira/browse/HBASE-12201
> Project: HBase
> Issue Type: Bug
> Affects Versions: hbase-11339
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Priority: Minor
> Fix For: hbase-11339
>
> Attachments: HBASE-12201-V2.diff, HBASE-12201.diff
>
>
> When running the sweep tool, we encountered such an exception.
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /hbase/mobdir/.tmp/mobcompaction/SweepJob-SweepMapper-SweepReducer-testSweepToolExpiredNoMinVersion-data/working/names/all
> (inode 46500): File does not exist. Holder
> DFSClient_NONMAPREDUCE_-1863270027_1 does not have any open files.
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3319)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3407)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3377)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:673)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:219)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:520)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy15.complete(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:435)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy16.complete(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2180)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2164)
> at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:908)
> at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:925)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:861)
> at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2687)
> at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2704)
> at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> This is because we have several writers opened by fs.create(path, true) are
> not closed properly.
> Meanwhile, in the current implementation, we save the temp files under
> sweepJobDir/working/..., and when we remove the directory of the sweep job
> only the working is deleted. We should remove the whole sweepJobDir instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)