[ 
https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6478:
--------------------------

    Attachment: HDFS-6478-3.patch

Updated patch given Jing has committed 
https://issues.apache.org/jira/browse/HADOOP-10673.

Please note that TestFileCreation#testOverwriteOpenForWrite will run around 5 
minutes after the patch. That is because AlreadyBeingCreatedExceptionNumOps 
will be retried and NamenodeProxies.java uses LEASE_SOFTLIMIT_PERIOD as 
retryUpToMaximumCountWithFixedSleep parameter. We can improve this by making NN 
lease soft and hard limit configurable via new Configuration parameters and 
NamenodeProxies.java will use the configured value. We can open a new jira if 
that is necessary.

> RemoteException can't be retried properly for non-HA scenario
> -------------------------------------------------------------
>
>                 Key: HDFS-6478
>                 URL: https://issues.apache.org/jira/browse/HDFS-6478
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6478-2.patch, HDFS-6478-3.patch, HDFS-6478.patch
>
>
> For HA case, the call stack is DFSClient -> RetryInvocationHandler -> 
> ClientNamenodeProtocolTranslatorPB -> ProtobufRpcEngine. ProtobufRpcEngine. 
> ProtobufRpcEngine throws ServiceException and expects the caller to unwrap 
> it; ClientNamenodeProtocolTranslatorPB is the component that takes care of 
> that.
> {noformat}
>         at org.apache.hadoop.ipc.Client.call
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
>         at com.sun.proxy.$Proxy26.getFileInfo
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
>         at sun.reflect.GeneratedMethodAccessor24.invoke
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke
>         at java.lang.reflect.Method.invoke
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
>         at com.sun.proxy.$Proxy27.getFileInfo
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus
> {noformat}
> However, for non-HA case, the call stack is DFSClient -> 
> ClientNamenodeProtocolTranslatorPB -> RetryInvocationHandler -> 
> ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be 
> retried properly.
> {noformat}
> at org.apache.hadoop.ipc.Client.call
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
> at com.sun.proxy.$Proxy9.getListing
> at sun.reflect.NativeMethodAccessorImpl.invoke0
> at sun.reflect.NativeMethodAccessorImpl.invoke
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> at java.lang.reflect.Method.invoke
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
> at com.sun.proxy.$Proxy9.getListing
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing
> at org.apache.hadoop.hdfs.DFSClient.listPaths
> {noformat}
> Perhaps, we can fix it by have NN wrap RetryInvocationHandler around 
> ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap 
> order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to