[ 
https://issues.apache.org/jira/browse/HBASE-29180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934610#comment-17934610
 ] 

Andrew Kyle Purtell commented on HBASE-29180:
---------------------------------------------

+1

UHE can be caused by transient issues but if it repeats it likely to be 
persistent especially in a dynamic environment like k8s.
The default number of retries after HBASE-28638 is 5 which is sufficient to 
distinguish between the cases.

> Apply fail-fast retry limit for UnknownHostException
> ----------------------------------------------------
>
>                 Key: HBASE-29180
>                 URL: https://issues.apache.org/jira/browse/HBASE-29180
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.5.11
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3
>
>
> As part of HBASE-28638, fail-fast retry limit has been introduced for errors 
> like CallQueueTooBigException, SaslException, ConnectionClosedException. This 
> helps limit the num of retries that RSProcedureDispatcher has to perform 
> while executing remote procedures. Since the region open/close fails on the 
> remote server, we also trigger SCP for the target server.
> We recently came across UnknownHostException as another example of where the 
> remote calls can get stuck forever:
> {code:java}
> WARN  [RSProcedureDispatcher-pool-98034] procedure.RSProcedureDispatcher - 
> request to rs1.xyz,60020,1739254267238 failed due to 
> java.net.UnknownHostException: Call to address=rs1.xyz:60020 failed on local 
> exception: java.net.UnknownHostException: rs1.xyz:60020 could not be 
> resolved, try=2867, retrying... , request params: open_region {
>   open_info {
>     region {
> ...
> ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to