[
https://issues.apache.org/jira/browse/HBASE-30245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
samad updated HBASE-30245:
--------------------------
Description:
Summary: Under a narrow but real Kubernetes condition — a RegionServer pod's IP
is reassigned to a different live pod (sometimes a RegionServer of an entirely
different HBase cluster) — the async client gets stuck issuing requests to that
wrong-but-live server and receives a continuous stream of
{{NotServingRegionException}} (NSRE) for the affected regions. The condition
does not self-heal: only a client process restart fixes it.
We run HBase on Kubernetes, where pod cohost both a RegionServer and a
DataNode. Multiple independent HBase clusters share the same Kubernetes
environment.
We observed a failure scenario during node maintenance where an HBase async
client can become permanently stuck talking to the wrong RegionServer after
Kubernetes pod IP reuse.
Consider the following example:
* *Pod A* hosts *RegionServer A* belonging to {*}HBase Cluster A{*}.
* *Pod B* hosts *RegionServer B* belonging to {*}HBase Cluster B{*}.
* Both pods are running on the same Kubernetes node.
During a maintenance activity (node reboot, drain, upgrade, etc.), all pods on
the node restart.
A possible sequence is:
# Pod A goes down.
# Kubernetes later reassigns Pod A's old IP address to Pod B.
# The client already has an established TCP connection to Pod A's old IP.
# Because the connection remains alive through the networking/service-mesh
layer, the client does not see a transport failure.
# Requests intended for RegionServer A are now delivered to RegionServer B,
which belongs to a completely different HBase cluster and has never hosted the
requested regions.
# RegionServer B correctly responds with {{{}NotServingRegionException
(NSRE){}}}.
# The client continues reusing the same underlying RPC connection and it gets
following continuously
2026-06-09T06:03:15.226Z, org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException:
app_ns:my-table,rowprefix-0001,1761217611582.65e37b957f2a12c0c710d3866bece520.
is not online on
hbase-B-dn-4.hbase-B.k8s-namespace.svc.cluster.local,16020,1780977222833
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3552)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3530)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1486)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2972)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44994)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Note: The table `app_ns:my-table` belongs to HBase cluster-A. The RegionServer
named in the NSRE (`hbase-B-dn-4`) belongs to a completely different HBase
cluster
(cluster-B). This table has never existed on cluster-B. The client is reaching
cluster-B's RegionServer only because cluster-B's pod acquired the IP address
that
previously belonged to cluster-A's pod (hbase-A-dn-7). After a Kubernetes node
maintenance event, both pods restarted on the same node. Cluster-B's pod came
up
first and was assigned cluster-A's old pod IP. The client's existing TCP
channel
was pinned to that IP and never re-resolved DNS, so all RPCs intended for
cluster-A's RegionServer are landing on cluster-B's RegionServer instead.
# Requests continue hitting RegionServer B and receive NSREs indefinitely.
In production, we observed this condition persisting for approximately an hour
and generating tens of thousands of NSREs. Recovery occurred only after
restarting the hbase client process
*Observed Behavior*
* Continuous NSREs for the same regions.
* The same RegionServer appears in all NSRE responses.
* The responding RegionServer belongs to a different HBase cluster than the
target table.
* No transport errors or connection failures are observed.
* Client restart immediately restores normal operation.
*Expected Behavior*
When the client receives repeated NSREs from a RegionServer that does not match
the expected destination, it should eventually drop the existing connection and
establish a fresh one, allowing DNS re-resolution and recovery without
requiring a client restart.
h3. Problem Identified
The HBase async client's {{NettyRpcConnection}} resolves DNS exactly once —
when the channel is first created — and never re-checks it for the lifetime of
that channel. If the underlying IP changes (e.g., Kubernetes pod IP reuse), the
channel remains pinned to the old (now wrong) IP indefinitely. An NSRE is an
application-level response and does not trigger channel closure, so the client
never gets a chance to re-resolve DNS.
h4. 1. DNS resolution happens only once per channel, inside {{connect()}}
While {{{}channel != null{}}}, the existing channel is reused indefinitely and
DNS is never re-checked.
{code:java|title=NettyRpcConnection.java — sendRequest0()}
@Override
public void run(boolean cancelled) throws IOException {
if (cancelled) {
setCancelled(call);
} else {
if (channel == null) { // ← ONLY path to DNS resolution
connect();
}
scheduleTimeoutTask(call);
NettyFutureUtils.addListener(channel.writeAndFlush(call), new
ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws
Exception {
if (!future.isSuccess()) {
call.setException(toIOE(future.cause()));
}
}
});
}
}
{code}
h4. 2. NSRE does not close the channel
The {{channel = null}} reset lives only in {{{}shutdown0(){}}}, which is
triggered by transport failures ({{{}channelInactive{}}},
{{{}exceptionCaught{}}}) or explicit shutdown — never by an application-level
NSRE response:
{code:java|title=NettyRpcConnection.java — shutdown0()}
private void shutdown0() {
assert eventLoop.inEventLoop();
if (channel != null) {
NettyFutureUtils.consume(channel.close());
channel = null; // ← ONLY place channel becomes null
}
}
{code}
{code:java|title=NettyRpcDuplexHandler.java — transport-level triggers only}
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
if (!id2Call.isEmpty()) {
cleanupCalls(new ConnectionClosedException("Connection closed"));
}
conn.shutdown(); // ← called on TCP break (RST, FIN)
ctx.fireChannelInactive();
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
if (!id2Call.isEmpty()) {
cleanupCalls(IPCUtil.toIOE(cause));
}
conn.shutdown(); // ← called on transport error (broken pipe, etc.)
}
{code}
An NSRE is a normal application-level response on a healthy TCP socket. It
triggers neither {{channelInactive}} nor {{{}exceptionCaught{}}}. Therefore
{{shutdown0()}} is never called, {{channel}} stays non-null, and {{connect()}}
(with fresh DNS) is never invoked again.
h3. Proposed Fixes — Requesting Guidance
We would appreciate committer guidance on which of the following approaches is
preferred. We are happy to contribute the patch and tests for whichever
direction the project prefers.
h4. Option 1: Peer-vs-DNS drift check at the channel-reuse gate
Peer-vs-DNS drift check at the channel-reuse gate
({{{}NettyRpcConnection.sendRequest0{}}}). Throttled re-check: if
{{channel.remoteAddress().getAddress()}} ≠ {{{}InetAddress.getByName(host){}}},
call {{shutdown0()}} so the existing {{if (channel == null) connect();}}
re-resolves. Connection-layer only, no protocol change.
h4. Option 2: Responder-identity check
On any error carrying a server identity, compare the responder's {{ServerName}}
against {{{}loc.getServerName(){}}}. If they mismatch, call
{{AbstractRpcClient.cancelConnections(loc.getServerName())}} to force
re-resolve on the next send.
Today the NSRE response only carries the responder identity inside the
exception message text. {{ExceptionResponse}} reserves
{{{}hostname{}}}/{{{}port{}}} fields for {{RegionMovedException}} only.
Extending the wire format with a {{responder_server_name}} field would let the
client do this structurally rather than parsing strings.
Hook point: {{AsyncRegionLocatorHelper.updateCachedLocationOnError}} already
receives both {{loc}} and the cause.
*Pros:* Most precise detection — only triggers when the responder is
definitively wrong. *Cons:* Requires protobuf/wire-format change.
h3. Questions for Committers
# Is Option 1 (DNS drift check, pure client-side fix) acceptable ?
# Would Option 2 (responder-identity with a proto field) be considered for a
more structural fix?
# Are there any prior JIRAs or discussions related to this that we should link
to / any other solutions?
was:
Summary: Under a narrow but real Kubernetes condition — a RegionServer pod's IP
is reassigned to a different live pod (sometimes a RegionServer of an entirely
different HBase cluster) — the async client gets stuck issuing requests to that
wrong-but-live server and receives a continuous stream of
{{NotServingRegionException}} (NSRE) for the affected regions. The condition
does not self-heal: only a client process restart fixes it.
We run HBase on Kubernetes, where pod cohost both a RegionServer and a
DataNode. Multiple independent HBase clusters share the same Kubernetes
environment.
We observed a failure scenario during node maintenance where an HBase async
client can become permanently stuck talking to the wrong RegionServer after
Kubernetes pod IP reuse.
Consider the following example:
* *Pod A* hosts *RegionServer A* belonging to {*}HBase Cluster A{*}.
* *Pod B* hosts *RegionServer B* belonging to {*}HBase Cluster B{*}.
* Both pods are running on the same Kubernetes node.
During a maintenance activity (node reboot, drain, upgrade, etc.), all pods on
the node restart.
A possible sequence is:
# Pod A goes down.
# Kubernetes later reassigns Pod A's old IP address to Pod B.
# The client already has an established TCP connection to Pod A's old IP.
# Because the connection remains alive through the networking/service-mesh
layer, the client does not see a transport failure.
# Requests intended for RegionServer A are now delivered to RegionServer B,
which belongs to a completely different HBase cluster and has never hosted the
requested regions.
# RegionServer B correctly responds with {{{}NotServingRegionException
(NSRE){}}}.
# The client continues reusing the same underlying RPC connection and it gets
following continuously
2026-06-09T06:03:15.226Z, org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException:
app_ns:my-table,rowprefix-0001,1761217611582.65e37b957f2a12c0c710d3866bece520.
is not online on
hbase-B-dn-4.hbase-B.k8s-namespace.svc.cluster.local,16020,1780977222833
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3552)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3530)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1486)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2972)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44994)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Note: The table `app_ns:my-table` belongs to HBase cluster-A. The RegionServer
named in the NSRE (`hbase-B-dn-4`) belongs to a completely different HBase
cluster
(cluster-B). This table has never existed on cluster-B. The client is reaching
cluster-B's RegionServer only because cluster-B's pod acquired the IP address
that
previously belonged to cluster-A's pod (hbase-A-dn-7). After a Kubernetes node
maintenance event, both pods restarted on the same node. Cluster-B's pod came
up
first and was assigned cluster-A's old pod IP. The client's existing TCP
channel
was pinned to that IP and never re-resolved DNS, so all RPCs intended for
cluster-A's RegionServer are landing on cluster-B's RegionServer instead.
# Requests continue hitting RegionServer B and receive NSREs indefinitely.
In production, we observed this condition persisting for approximately an hour
and generating tens of thousands of NSREs. Recovery occurred only after
restarting the hbase client process
*Observed Behavior*
* Continuous NSREs for the same regions.
* The same RegionServer appears in all NSRE responses.
* The responding RegionServer belongs to a different HBase cluster than the
target table.
* No transport errors or connection failures are observed.
* Client restart immediately restores normal operation.
*Expected Behavior*
When the client receives repeated NSREs from a RegionServer that does not match
the expected destination, it should eventually drop the existing connection and
establish a fresh one, allowing DNS re-resolution and recovery without
requiring a client restart.
h3. Problem Identified
The HBase async client's \{{NettyRpcConnection}} resolves DNS exactly once —
when the channel is first created — and never re-checks it for the lifetime of
that channel. If the underlying IP changes (e.g., Kubernetes pod IP reuse), the
channel remains pinned to the old (now wrong) IP indefinitely. An NSRE is an
application-level response and does not trigger channel closure, so the client
never gets a chance to re-resolve DNS.
h4. 1. DNS resolution happens only once per channel, inside \{{connect()}}
While \{{channel != null}}, the existing channel is reused indefinitely and DNS
is never re-checked.
{code:java|title=NettyRpcConnection.java — sendRequest0()}
@Override
public void run(boolean cancelled) throws IOException {
if (cancelled) {
setCancelled(call);
} else {
if (channel == null) { // ← ONLY path to DNS resolution
connect();
}
scheduleTimeoutTask(call);
NettyFutureUtils.addListener(channel.writeAndFlush(call), new
ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws
Exception {
if (!future.isSuccess()) {
call.setException(toIOE(future.cause()));
}
}
});
}
}
{code}
h4. 2. NSRE does not close the channel
The \{{channel = null}} reset lives only in \{{shutdown0()}}, which is
triggered by transport failures (\{{channelInactive}}, \{{exceptionCaught}}) or
explicit shutdown — never by an application-level NSRE response:
{code:java|title=NettyRpcConnection.java — shutdown0()}
private void shutdown0() {
assert eventLoop.inEventLoop();
if (channel != null) {
NettyFutureUtils.consume(channel.close());
channel = null; // ← ONLY place channel becomes null
}
}
{code}
{code:java|title=NettyRpcDuplexHandler.java — transport-level triggers only}
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
if (!id2Call.isEmpty()) {
cleanupCalls(new ConnectionClosedException("Connection closed"));
}
conn.shutdown(); // ← called on TCP break (RST, FIN)
ctx.fireChannelInactive();
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
if (!id2Call.isEmpty()) {
cleanupCalls(IPCUtil.toIOE(cause));
}
conn.shutdown(); // ← called on transport error (broken pipe, etc.)
}
{code}
An NSRE is a normal application-level response on a healthy TCP socket. It
triggers neither \{{channelInactive}} nor \{{exceptionCaught}}. Therefore
\{{shutdown0()}} is never called, \{{channel}} stays non-null, and
\{{connect()}} (with fresh DNS) is never invoked again.
h4. 3. The resulting loop
{noformat}
NSRE
→ region location cache evicted (correct)
→ meta re-read → returns same hostname (correct — region IS assigned to this
RS)
→ connection pool lookup by hostname → returns same NettyRpcConnection
→ sendRequest0() → channel != null → skips connect() → skips DNS
→ writeAndFlush on channel pinned to wrong IP
→ reaches wrong RegionServer
→ NSRE
→ repeat forever
{noformat}
----
h3. Proposed Fixes — Requesting Guidance
We would appreciate committer guidance on which of the following approaches is
preferred. We are happy to contribute the patch and tests for whichever
direction the project prefers.
h4. Option 1: Peer-vs-DNS drift check at the channel-reuse gate
In \{{NettyRpcConnection.sendRequest0()}}, periodically compare the channel's
connected IP against current DNS. If they differ, call \{{shutdown0()}} so the
existing \{{if (channel == null) connect();}} path re-resolves. Throttled to
avoid excessive DNS lookups (e.g., once every 30 seconds).
{code:java|title=Proposed change — NettyRpcConnection.java}
// New fields
private InetAddress channelConnectedIp;
private long lastDnsCheckTime;
public static final String DNS_CHECK_INTERVAL_KEY =
"hbase.client.dns.check.interval.ms";
public static final long DNS_CHECK_INTERVAL_DEFAULT = 30_000;
// In connect():
private void connect() throws UnknownHostException {
InetSocketAddress remoteAddr = getRemoteInetAddress(rpcClient.metrics);
this.channelConnectedIp = remoteAddr.getAddress();
this.lastDnsCheckTime = EnvironmentEdgeManager.currentTime();
this.channel = new Bootstrap()
.group(eventLoop).channel(rpcClient.channelClass)
.remoteAddress(remoteAddr).connect().channel();
}
// New method:
private boolean hasIpChanged() {
long now = EnvironmentEdgeManager.currentTime();
long interval = rpcClient.conf.getLong(
DNS_CHECK_INTERVAL_KEY, DNS_CHECK_INTERVAL_DEFAULT);
if (now - lastDnsCheckTime < interval) {
return false;
}
lastDnsCheckTime = now;
try {
InetAddress currentIp = InetAddress.getByName(
remoteId.getAddress().getHostName());
return !currentIp.equals(channelConnectedIp);
} catch (UnknownHostException e) {
return false;
}
}
// Modified sendRequest0():
if (channel == null || hasIpChanged()) {
if (channel != null) {
LOG.warn("DNS for {} changed from {}. Reconnecting.",
remoteId.getAddress(), channelConnectedIp);
shutdown0();
}
connect();
}
{code}
*Pros:* Connection-layer only. No protocol change. Single file change. Zero
false positives — channel is only closed when the IP actually changed. Overhead
is one DNS lookup every 30s per RS connection (~0.1ms).
h4. Option 2: Responder-identity check
On any error carrying a server identity, compare the responder's
\{{ServerName}} against \{{loc.getServerName()}}. If they mismatch, call
\{{AbstractRpcClient.cancelConnections(loc.getServerName())}} to force
re-resolve on the next send.
Today the NSRE response only carries the responder identity inside the
exception message text. \{{ExceptionResponse}} reserves \{{hostname}}/\{{port}}
fields for \{{RegionMovedException}} only. Extending the wire format with a
\{{responder_server_name}} field would let the client do this structurally
rather than parsing strings.
Hook point: \{{AsyncRegionLocatorHelper.updateCachedLocationOnError}} already
receives both \{{loc}} and the cause.
*Pros:* Most precise detection — only triggers when the responder is
definitively wrong. *Cons:* Requires protobuf/wire-format change.
h4. Option 3: Consecutive NSRE counter per connection
Track consecutive NSREs per \{{NettyRpcConnection}}. After N consecutive NSREs
(e.g., 3), close the channel. Reset counter on any successful response.
*Pros:* Simple, no protocol change. *Cons:* Slightly less precise than Option 1
— relies on a threshold heuristic rather than direct IP comparison.
----
h3. Questions for Committers
# Is Option 1 (DNS drift check, pure client-side fix) acceptable for
\{{master}} and \{{branch-2.5}}?
# Would Option 2 (responder-identity with a proto field) be considered for a
more structural fix?
# Are there any prior JIRAs or discussions related to this that we should link
to?
> RPC connection pinned to stale IP after cross-pod IP reuse: NSRE storm
> persists indefinitely because pooled channel is reused without re-resolving
> DNS
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-30245
> URL: https://issues.apache.org/jira/browse/HBASE-30245
> Project: HBase
> Issue Type: Bug
> Environment: * HBase 2.5.12 client (present in other client versions
> as well)
> * HBase deployed on Kubernetes.
> * RegionServer and DataNode are co-hosted in the same pod.
> * Multiple HBase clusters run in the same Kubernetes environment.
> Reporter: samad
> Priority: Major
>
> Summary: Under a narrow but real Kubernetes condition — a RegionServer pod's
> IP is reassigned to a different live pod (sometimes a RegionServer of an
> entirely different HBase cluster) — the async client gets stuck issuing
> requests to that wrong-but-live server and receives a continuous stream of
> {{NotServingRegionException}} (NSRE) for the affected regions. The condition
> does not self-heal: only a client process restart fixes it.
> We run HBase on Kubernetes, where pod cohost both a RegionServer and a
> DataNode. Multiple independent HBase clusters share the same Kubernetes
> environment.
> We observed a failure scenario during node maintenance where an HBase async
> client can become permanently stuck talking to the wrong RegionServer after
> Kubernetes pod IP reuse.
> Consider the following example:
> * *Pod A* hosts *RegionServer A* belonging to {*}HBase Cluster A{*}.
> * *Pod B* hosts *RegionServer B* belonging to {*}HBase Cluster B{*}.
> * Both pods are running on the same Kubernetes node.
> During a maintenance activity (node reboot, drain, upgrade, etc.), all pods
> on the node restart.
> A possible sequence is:
> # Pod A goes down.
> # Kubernetes later reassigns Pod A's old IP address to Pod B.
> # The client already has an established TCP connection to Pod A's old IP.
> # Because the connection remains alive through the networking/service-mesh
> layer, the client does not see a transport failure.
> # Requests intended for RegionServer A are now delivered to RegionServer B,
> which belongs to a completely different HBase cluster and has never hosted
> the requested regions.
> # RegionServer B correctly responds with {{{}NotServingRegionException
> (NSRE){}}}.
> # The client continues reusing the same underlying RPC connection and it
> gets following continuously
> 2026-06-09T06:03:15.226Z, org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> app_ns:my-table,rowprefix-0001,1761217611582.65e37b957f2a12c0c710d3866bece520.
>
> is not online on
> hbase-B-dn-4.hbase-B.k8s-namespace.svc.cluster.local,16020,1780977222833
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3552)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3530)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1486)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2972)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44994)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
>
> Note: The table `app_ns:my-table` belongs to HBase cluster-A. The
> RegionServer
> named in the NSRE (`hbase-B-dn-4`) belongs to a completely different HBase
> cluster
> (cluster-B). This table has never existed on cluster-B. The client is
> reaching
> cluster-B's RegionServer only because cluster-B's pod acquired the IP address
> that
> previously belonged to cluster-A's pod (hbase-A-dn-7). After a Kubernetes
> node
> maintenance event, both pods restarted on the same node. Cluster-B's pod came
> up
> first and was assigned cluster-A's old pod IP. The client's existing TCP
> channel
> was pinned to that IP and never re-resolved DNS, so all RPCs intended for
> cluster-A's RegionServer are landing on cluster-B's RegionServer instead.
> # Requests continue hitting RegionServer B and receive NSREs indefinitely.
> In production, we observed this condition persisting for approximately an
> hour and generating tens of thousands of NSREs. Recovery occurred only after
> restarting the hbase client process
> *Observed Behavior*
> * Continuous NSREs for the same regions.
> * The same RegionServer appears in all NSRE responses.
> * The responding RegionServer belongs to a different HBase cluster than the
> target table.
> * No transport errors or connection failures are observed.
> * Client restart immediately restores normal operation.
> *Expected Behavior*
> When the client receives repeated NSREs from a RegionServer that does not
> match the expected destination, it should eventually drop the existing
> connection and establish a fresh one, allowing DNS re-resolution and recovery
> without requiring a client restart.
> h3. Problem Identified
> The HBase async client's {{NettyRpcConnection}} resolves DNS exactly once —
> when the channel is first created — and never re-checks it for the lifetime
> of that channel. If the underlying IP changes (e.g., Kubernetes pod IP
> reuse), the channel remains pinned to the old (now wrong) IP indefinitely. An
> NSRE is an application-level response and does not trigger channel closure,
> so the client never gets a chance to re-resolve DNS.
> h4. 1. DNS resolution happens only once per channel, inside {{connect()}}
> While {{{}channel != null{}}}, the existing channel is reused indefinitely
> and DNS is never re-checked.
> {code:java|title=NettyRpcConnection.java — sendRequest0()}
> @Override
> public void run(boolean cancelled) throws IOException {
> if (cancelled) {
> setCancelled(call);
> } else {
> if (channel == null) { // ← ONLY path to DNS resolution
> connect();
> }
> scheduleTimeoutTask(call);
> NettyFutureUtils.addListener(channel.writeAndFlush(call), new
> ChannelFutureListener() {
> @Override
> public void operationComplete(ChannelFuture future) throws
> Exception {
> if (!future.isSuccess()) {
> call.setException(toIOE(future.cause()));
> }
> }
> });
> }
> }
> {code}
> h4. 2. NSRE does not close the channel
> The {{channel = null}} reset lives only in {{{}shutdown0(){}}}, which is
> triggered by transport failures ({{{}channelInactive{}}},
> {{{}exceptionCaught{}}}) or explicit shutdown — never by an application-level
> NSRE response:
> {code:java|title=NettyRpcConnection.java — shutdown0()}
> private void shutdown0() {
> assert eventLoop.inEventLoop();
> if (channel != null) {
> NettyFutureUtils.consume(channel.close());
> channel = null; // ← ONLY place channel becomes null
> }
> }
> {code}
> {code:java|title=NettyRpcDuplexHandler.java — transport-level triggers only}
> @Override
> public void channelInactive(ChannelHandlerContext ctx) throws Exception {
> if (!id2Call.isEmpty()) {
> cleanupCalls(new ConnectionClosedException("Connection closed"));
> }
> conn.shutdown(); // ← called on TCP break (RST, FIN)
> ctx.fireChannelInactive();
> }
> @Override
> public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
> if (!id2Call.isEmpty()) {
> cleanupCalls(IPCUtil.toIOE(cause));
> }
> conn.shutdown(); // ← called on transport error (broken pipe, etc.)
> }
> {code}
> An NSRE is a normal application-level response on a healthy TCP socket. It
> triggers neither {{channelInactive}} nor {{{}exceptionCaught{}}}. Therefore
> {{shutdown0()}} is never called, {{channel}} stays non-null, and
> {{connect()}} (with fresh DNS) is never invoked again.
> h3. Proposed Fixes — Requesting Guidance
> We would appreciate committer guidance on which of the following approaches
> is preferred. We are happy to contribute the patch and tests for whichever
> direction the project prefers.
> h4. Option 1: Peer-vs-DNS drift check at the channel-reuse gate
> Peer-vs-DNS drift check at the channel-reuse gate
> ({{{}NettyRpcConnection.sendRequest0{}}}). Throttled re-check: if
> {{channel.remoteAddress().getAddress()}} ≠
> {{{}InetAddress.getByName(host){}}}, call {{shutdown0()}} so the existing
> {{if (channel == null) connect();}} re-resolves. Connection-layer only, no
> protocol change.
> h4. Option 2: Responder-identity check
> On any error carrying a server identity, compare the responder's
> {{ServerName}} against {{{}loc.getServerName(){}}}. If they mismatch, call
> {{AbstractRpcClient.cancelConnections(loc.getServerName())}} to force
> re-resolve on the next send.
> Today the NSRE response only carries the responder identity inside the
> exception message text. {{ExceptionResponse}} reserves
> {{{}hostname{}}}/{{{}port{}}} fields for {{RegionMovedException}} only.
> Extending the wire format with a {{responder_server_name}} field would let
> the client do this structurally rather than parsing strings.
> Hook point: {{AsyncRegionLocatorHelper.updateCachedLocationOnError}} already
> receives both {{loc}} and the cause.
> *Pros:* Most precise detection — only triggers when the responder is
> definitively wrong. *Cons:* Requires protobuf/wire-format change.
> h3. Questions for Committers
> # Is Option 1 (DNS drift check, pure client-side fix) acceptable ?
> # Would Option 2 (responder-identity with a proto field) be considered for a
> more structural fix?
> # Are there any prior JIRAs or discussions related to this that we should
> link to / any other solutions?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)