Li Chao created HBASE-28119:
-------------------------------
Summary: LogRoller stuck by
FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time
Key: HBASE-28119
URL: https://issues.apache.org/jira/browse/HBASE-28119
Project: HBase
Issue Type: Bug
Affects Versions: 2.2.7
Reporter: Li Chao
Attachments: image-2023-09-29-17-23-04-560.png
We found this problem in our production. LogRoller stuck by
FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time
!image-2023-09-29-17-23-04-560.png|width=566,height=191!
Check the regionserver's log, the regionServer do sasl negotiate with two
dataNode, but just one check complete. Another do nothing after connected with
dn.
{code:java}
518415 2023-04-17 14:17:25,434 INFO
io.transwarp.guardian.client.cache.PeriodCacheUpdater: Fetch change version: 0
518416 2023-04-17 14:17:29,092 DEBUG org.apache.hadoop.hbase.ScheduledChore:
RefreshCredentials execution time: 0 ms.
518417 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
CompactionChecker execution time: 0 ms.
518418 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
CompactionThroughputTuner execution time: 0 ms.518419 2023-04-17 14:17:29,768
DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution
time: 0 ms.
518420 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
gy-dmz-swrzjzcc-gx-2-19,60020,1677341424491-Hea pMemoryTunerChore
execution time: 0 ms.
518421 2023-04-17 14:17:39,375 DEBUG
org.apache.hadoop.hbase.regionserver.LogRoller: WAL AsyncFSWAL
gy-dmz-swrzjzcc-gx-2-19% 2C60020%2C1677341424491:(num 1681711899342) roll
requested
518422 2023-04-17 14:17:39,389 DEBUG
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL
client doing general handshake for addr = 10.179.157.10/10.179.157.10,
datanodeId = DatanodeInfoWithStorage[10.179.157.10:50
010,DS-4815c34a-8d0c-42b9-b56c-529d2732d956,DISK]
518423 2023-04-17 14:17:39,391 DEBUG
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: SASL
client doing general handshake for addr = 10.179.157.29/10.179.157.29,
datanodeId = DatanodeInfoWithStorage[10.179.157.29:50
010,DS-509f84fe-2e88-403e-87b5-f4765e49094f,DISK]
518424 2023-04-17 14:17:39,392 DEBUG
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper:
Verifying QO P, requested QOP = [auth], negotiated QOP = auth
518425 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore:
MemstoreFlusherChore execution time: 0 ms.
518426 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore:
CompactionChecker execution time: 0 ms.
518427 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore:
CompactionChecker execution time: 0 ms.
518428 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore:
MemstoreFlusherChore execution time: 0 ms.
518429 2023-04-17 14:17:55,492 INFO {code}
FanOutOneBlockAsyncDFSOutputHelper.createOutput will connect and
trySaslNegotiate to dataNode. In Sasl authentication mode, SaslNegotiateHandler
will be used to handle authentication. If datanode is shut down,
SaslNegotiateHandler.channelInactive do not call back to promise and cause
future to be stuck forever.
{code:java}
@Override
public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
ctx.write(ctx.alloc().buffer(4).writeInt(SASL_TRANSFER_MAGIC_NUMBER));
sendSaslMessage(ctx, new byte[0]);
ctx.flush();
step++;
}
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
saslClient.dispose();
} {code}
So SaslNegotiateHandler.channelInactive should call promise.tryFailure to avoid
future stuck forever.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)