[
https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ming Ma updated HDFS-6478:
--------------------------
Attachment: HDFS-6478-3.patch
Updated patch given Jing has committed
https://issues.apache.org/jira/browse/HADOOP-10673.
Please note that TestFileCreation#testOverwriteOpenForWrite will run around 5
minutes after the patch. That is because AlreadyBeingCreatedExceptionNumOps
will be retried and NamenodeProxies.java uses LEASE_SOFTLIMIT_PERIOD as
retryUpToMaximumCountWithFixedSleep parameter. We can improve this by making NN
lease soft and hard limit configurable via new Configuration parameters and
NamenodeProxies.java will use the configured value. We can open a new jira if
that is necessary.
> RemoteException can't be retried properly for non-HA scenario
> -------------------------------------------------------------
>
> Key: HDFS-6478
> URL: https://issues.apache.org/jira/browse/HDFS-6478
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: HDFS-6478-2.patch, HDFS-6478-3.patch, HDFS-6478.patch
>
>
> For HA case, the call stack is DFSClient -> RetryInvocationHandler ->
> ClientNamenodeProtocolTranslatorPB -> ProtobufRpcEngine. ProtobufRpcEngine.
> ProtobufRpcEngine throws ServiceException and expects the caller to unwrap
> it; ClientNamenodeProtocolTranslatorPB is the component that takes care of
> that.
> {noformat}
> at org.apache.hadoop.ipc.Client.call
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
> at com.sun.proxy.$Proxy26.getFileInfo
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
> at sun.reflect.GeneratedMethodAccessor24.invoke
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> at java.lang.reflect.Method.invoke
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
> at com.sun.proxy.$Proxy27.getFileInfo
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus
> {noformat}
> However, for non-HA case, the call stack is DFSClient ->
> ClientNamenodeProtocolTranslatorPB -> RetryInvocationHandler ->
> ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be
> retried properly.
> {noformat}
> at org.apache.hadoop.ipc.Client.call
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
> at com.sun.proxy.$Proxy9.getListing
> at sun.reflect.NativeMethodAccessorImpl.invoke0
> at sun.reflect.NativeMethodAccessorImpl.invoke
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> at java.lang.reflect.Method.invoke
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
> at com.sun.proxy.$Proxy9.getListing
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing
> at org.apache.hadoop.hdfs.DFSClient.listPaths
> {noformat}
> Perhaps, we can fix it by have NN wrap RetryInvocationHandler around
> ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap
> order.
--
This message was sent by Atlassian JIRA
(v6.2#6252)