[ 
https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113766#comment-14113766
 ] 

Nicolas Liochon commented on HBASE-11835:
-----------------------------------------

bq. You think that on reconnect of a connection, server will send the other 
half of a response? It wont have just dropped it?
Nope, but we don't do a reconnect on a SocketTimeoutException, we can have the 
following scenario:
 - call read start. We read 50% of the data
 - for whatever reason, the transfer hangs
 - we have a socket timout on the reader
 - with the code above, we put the exception in the call, but we continue with 
the same socket
 - we receive the remaining 50% of the previous call
 - but we think it's a new one.

With our current timeout it's unlikely (but very scary on the 0.98 considering 
what we do with the socket timeout there). Lower the timeout (in the 1.0, with 
the separation we now have, we could have a timeout of 5s), and this will occur 
much often. On the other hand, at the end, the connection will be close at the 
end in both cases. Still it needs to be fixed. I will do it in a separate jira 
after this one.

 

> Wrong managenement of non expected calls in the client
> ------------------------------------------------------
>
>                 Key: HBASE-11835
>                 URL: https://issues.apache.org/jira/browse/HBASE-11835
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.0, 2.0.0, 0.98.6
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>             Fix For: 1.0.0, 2.0.0, 0.98.7
>
>         Attachments: rpcClient.patch
>
>
> If a call is purged or canceled we try to skip the reply from the server, but 
> we read the wrong number of bytes so we corrupt the tcp channel. It's hidden 
> as it triggers retry and so on, but it's bad for performances obviously.
> It happens with cell blocks.
> [~ram_krish_86], [[email protected]], you know this part better than me, 
> do you agree with the analysis and the patch?
> The changes in rpcServer are not fully related: as the client close the 
> connections in such situation, I observed  both ClosedChannelException and 
> CancelledKeyException. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to