[
https://issues.apache.org/jira/browse/HBASE-28119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Li Chao updated HBASE-28119:
----------------------------
Attachment: HBASE-28119.patch
> LogRoller stuck by FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting
> get future all time
> -----------------------------------------------------------------------------------------------
>
> Key: HBASE-28119
> URL: https://issues.apache.org/jira/browse/HBASE-28119
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 2.2.7
> Reporter: Li Chao
> Priority: Major
> Attachments: HBASE-28119.patch, image-2023-09-29-17-23-04-560.png
>
>
> We found this problem in our production. LogRoller stuck by
> FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time
> !image-2023-09-29-17-23-04-560.png|width=566,height=191!
> Check the regionserver's log, the regionServer do sasl negotiate with two
> dataNode, but just one check complete. Another do nothing after connected
> with dn.
> {code:java}
> 518415 2023-04-17 14:17:25,434 INFO
> io.transwarp.guardian.client.cache.PeriodCacheUpdater: Fetch change version: 0
> 518416 2023-04-17 14:17:29,092 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> RefreshCredentials execution time: 0 ms.
> 518417 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> CompactionChecker execution time: 0 ms.
> 518418 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> CompactionThroughputTuner execution time: 0 ms.518419 2023-04-17 14:17:29,768
> DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution
> time: 0 ms.
> 518420 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> gy-dmz-swrzjzcc-gx-2-19,60020,1677341424491-Hea pMemoryTunerChore
> execution time: 0 ms.
> 518421 2023-04-17 14:17:39,375 DEBUG
> org.apache.hadoop.hbase.regionserver.LogRoller: WAL AsyncFSWAL
> gy-dmz-swrzjzcc-gx-2-19% 2C60020%2C1677341424491:(num 1681711899342)
> roll requested
> 518422 2023-04-17 14:17:39,389 DEBUG
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper:
> SASL client doing general handshake for addr =
> 10.179.157.10/10.179.157.10, datanodeId =
> DatanodeInfoWithStorage[10.179.157.10:50
> 010,DS-4815c34a-8d0c-42b9-b56c-529d2732d956,DISK]
> 518423 2023-04-17 14:17:39,391 DEBUG
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper:
> SASL client doing general handshake for addr =
> 10.179.157.29/10.179.157.29, datanodeId =
> DatanodeInfoWithStorage[10.179.157.29:50
> 010,DS-509f84fe-2e88-403e-87b5-f4765e49094f,DISK]
> 518424 2023-04-17 14:17:39,392 DEBUG
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper:
> Verifying QO P, requested QOP = [auth], negotiated QOP = auth
> 518425 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> MemstoreFlusherChore execution time: 0 ms.
> 518426 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> CompactionChecker execution time: 0 ms.
> 518427 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> CompactionChecker execution time: 0 ms.
> 518428 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore:
> MemstoreFlusherChore execution time: 0 ms.
> 518429 2023-04-17 14:17:55,492 INFO {code}
> FanOutOneBlockAsyncDFSOutputHelper.createOutput will connect and
> trySaslNegotiate to dataNode. In Sasl authentication mode,
> SaslNegotiateHandler will be used to handle authentication. If datanode is
> shut down, SaslNegotiateHandler.channelInactive do not call back to promise
> and cause future to be stuck forever.
> {code:java}
> @Override
> public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
> ctx.write(ctx.alloc().buffer(4).writeInt(SASL_TRANSFER_MAGIC_NUMBER));
> sendSaslMessage(ctx, new byte[0]);
> ctx.flush();
> step++;
> }
> @Override
> public void channelInactive(ChannelHandlerContext ctx) throws Exception {
> saslClient.dispose();
> } {code}
> So SaslNegotiateHandler.channelInactive should call promise.tryFailure to
> avoid future stuck forever.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)