[
https://issues.apache.org/jira/browse/HBASE-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821177#comment-15821177
]
Duo Zhang commented on HBASE-17453:
-----------------------------------
I‘d say not only on AWS. In the new rpc implementation in 2.0, we will not
throw SocketTimeoutException when rpc call timeout which means we will not
close the connection. So if the remote machine is crashed, the broken
connection will be stuck for hours until TCP keepalive take its duty to close
the connection.
In our internal HBase version I have added a ping feature at the rpc layer. And
I have also thought of just calling a method of the remote service to test if
the connection is still alive, simple but useful. I think both solutions are
acceptable. Let's see your patch.
Thanks for your contribution.
> add Ping into HBase server for deprecated GetProtocolVersion
> ------------------------------------------------------------
>
> Key: HBASE-17453
> URL: https://issues.apache.org/jira/browse/HBASE-17453
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 1.2.2
> Reporter: Tianying Chang
> Assignee: Tianying Chang
> Priority: Minor
>
> Our HBase service is hosted in AWS. We saw cases where the connection between
> the client (Asynchbase in our case) and server stop working but did not throw
> any exception, therefore traffic stuck. So we added a "Ping" feature in
> AsyncHBase 1.5 by utilizing the GetProtocolVersion() API provided at RS side,
> if no traffic for given time, we send the "Ping", if no response back for
> "Ping", we assume the connect is bad and reconnect.
> Now we are upgrading cluster from 94 to 1.2. However, GetProtocolVersion() is
> deprecated. To be able to support same detect/reconnect feature, we added
> Ping() in our internal HBase 1.2 branch, and also patched accordingly in
> Asynchbase 1.7.
> We would like to open source this feature since it is useful for use case in
> AWS environment.
> We used GetProtocolVersion in AsyncHBase to detect unhealthy connection to RS
> since in AWS, sometimes it enters a state the connection
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)