[ 
https://issues.apache.org/jira/browse/HBASE-29319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HBASE-29319.
----------------------------------
    Fix Version/s: 2.7.0
                   3.0.0-beta-2
                   2.6.3
                   2.5.12
     Hadoop Flags: Reviewed
       Resolution: Fixed

> Apply fail-fast retry limit for ConnectException
> ------------------------------------------------
>
>                 Key: HBASE-29319
>                 URL: https://issues.apache.org/jira/browse/HBASE-29319
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.5.11, 2.6.2
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.3, 2.5.12
>
>
> As part of HBASE-28638 and HBASE-29180, fail-fast retry limit has been 
> introduced for errors like CallQueueTooBigException, SaslException, 
> ConnectionClosedException, UnknownHostException. This helps limit the num of 
> retries that RSProcedureDispatcher has to perform while executing remote 
> procedures. Since the region open/close fails on the remote server, we also 
> trigger SCP for the target server.
> We recently came across ConnectException as well:
> {code:java}
> 2025-04-18 15:16:34,013 DEBUG [RSProcedureDispatcher-pool-50043] 
> procedure.RSProcedureDispatcher - Request to xyz.abc,60020,1744167387805 
> failed, try=0
> java.net.ConnectException: Call to address=xyz.abc:60020 failed on connection 
> exception: 
> org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: 
> connection timed out after 10000 ms: xyz.abc:60020
>     at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:204)
>     at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
>     at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
>     at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
>     at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
>     at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
>     at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
>     at 
> org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:107)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:398)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:368)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1425)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:396)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:912)
>     at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:207)
>     at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$900(NettyRpcConnection.java:82)
>     at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.fail(NettyRpcConnection.java:334)
>     at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:342)
>     at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:316)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:616)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:405)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.ConnectTimeoutException: 
> connection timed out after 10000 ms: xyz.abc:60020
>     at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
>     ... 10 more {code}
> and so on...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to