[ 
https://issues.apache.org/jira/browse/HDFS-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002739#comment-17002739
 ] 

Fei Hui edited comment on HDFS-15078 at 12/24/19 10:23 AM:
-----------------------------------------------------------

{quote}
The issue is the first router which sent the request that late, That client did 
failover to another router, triggered a new call and the second router 
completed the call, and the first call came after this. 
{quote}
Getting EOFException makes client failover to another router. 
And later the second router completed the call,  the first router sent the 
request late. If just the first router sent the request late, client doesn't 
get exception, it will not failover

{quote}
If the client crashed post the check, this scenario will again come, This 
doesn't seems to be a problem with the client crashing and the Router sending 
the request still to Namenode,

If such a case where one Router is delaying, I think without client connection 
crashing still issues like these can come up.
{quote}
Yes. This issue only can resolve the problem on some scenarios and it's just an 
improvement. HDFS-15079 tracks the high level problem.

In our  scenarios. This fix works.



was (Author: ferhui):
{quote}
The issue is the first router which sent the request that late, That client did 
failover to another router, triggered a new call and the second router 
completed the call, and the first call came after this. 
{quote}
Getting EOFException makes client failover to another router. 
And later the second router completed the call,  the first router sent the 
request late. If just the first router sent the request late, client doesn't 
get exception, it will not failover

{quote}
If such a case where one Router is delaying, I think without client connection 
crashing still issues like these can come up.
{quote}
Yes. This issue only can resolve the problem on some scenarios. HDFS-15079 
tracks the high level problem.

In our  scenarios. This fix works.


> RBF: Should check connection channel before sending rpc to namenode
> -------------------------------------------------------------------
>
>                 Key: HDFS-15078
>                 URL: https://issues.apache.org/jira/browse/HDFS-15078
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-15078.001.patch, HDFS-15078.002.patch
>
>
> dfsrouter logs show that
> {quote}
> 2019-12-20 04:11:26,724 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 6400 on 8888, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 10.83.164.11:56908 Call#2 Retry#0: output error
> 2019-12-20 04:11:26,724 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 125 on 8888 caught an exception
> java.nio.channels.ClosedChannelException
>         at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2731)
>         at org.apache.hadoop.ipc.Server.access$2100(Server.java:134)
>         at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1089)
>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1161)
>         at 
> org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2109)
>         at 
> org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1229)
>         at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:631)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2245)
> {quote}
> Maybe checking connection between client and router is better before 
> sendingrpc to namenode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to