[ 
https://issues.apache.org/jira/browse/HBASE-28119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Chao updated HBASE-28119:
----------------------------
    Attachment: HBASE-28119-1.patch
      Assignee: Li Chao
        Status: Patch Available  (was: Open)

> LogRoller stuck by FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting 
> get future all time
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28119
>                 URL: https://issues.apache.org/jira/browse/HBASE-28119
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.2.7
>            Reporter: Li Chao
>            Assignee: Li Chao
>            Priority: Major
>         Attachments: HBASE-28119-1.patch, HBASE-28119.patch, 
> image-2023-09-29-17-23-04-560.png
>
>
> We found this problem in our production. LogRoller stuck by 
> FanOutOneBlockAsyncDFSOutputHelper.createOutput waitting get future all time
> !image-2023-09-29-17-23-04-560.png|width=566,height=191!
> Check the regionserver's log, the regionServer do sasl negotiate with two 
> dataNode, but just one check complete. Another do nothing after connected 
> with dn.
> {code:java}
> 518415 2023-04-17 14:17:25,434 INFO 
> io.transwarp.guardian.client.cache.PeriodCacheUpdater: Fetch change version: 0
> 518416 2023-04-17 14:17:29,092 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> RefreshCredentials execution time: 0 ms.
> 518417 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> CompactionChecker execution time: 0 ms.
> 518418 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> CompactionThroughputTuner execution time: 0 ms.518419 2023-04-17 14:17:29,768 
> DEBUG org.apache.hadoop.hbase.ScheduledChore: MemstoreFlusherChore execution 
> time: 0 ms.
> 518420 2023-04-17 14:17:29,768 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> gy-dmz-swrzjzcc-gx-2-19,60020,1677341424491-Hea       pMemoryTunerChore 
> execution time: 0 ms.
> 518421 2023-04-17 14:17:39,375 DEBUG 
> org.apache.hadoop.hbase.regionserver.LogRoller: WAL AsyncFSWAL 
> gy-dmz-swrzjzcc-gx-2-19%       2C60020%2C1677341424491:(num 1681711899342) 
> roll requested
> 518422 2023-04-17 14:17:39,389 DEBUG 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: 
> SASL client        doing general handshake for addr = 
> 10.179.157.10/10.179.157.10, datanodeId = 
> DatanodeInfoWithStorage[10.179.157.10:50       
> 010,DS-4815c34a-8d0c-42b9-b56c-529d2732d956,DISK]
> 518423 2023-04-17 14:17:39,391 DEBUG 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: 
> SASL client        doing general handshake for addr = 
> 10.179.157.29/10.179.157.29, datanodeId = 
> DatanodeInfoWithStorage[10.179.157.29:50       
> 010,DS-509f84fe-2e88-403e-87b5-f4765e49094f,DISK]
> 518424 2023-04-17 14:17:39,392 DEBUG 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: 
> Verifying QO       P, requested QOP = [auth], negotiated QOP = auth
> 518425 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> MemstoreFlusherChore execution time: 0 ms.
> 518426 2023-04-17 14:17:39,743 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> CompactionChecker execution time: 0 ms.
> 518427 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> CompactionChecker execution time: 0 ms.
> 518428 2023-04-17 14:17:49,977 DEBUG org.apache.hadoop.hbase.ScheduledChore: 
> MemstoreFlusherChore execution time: 0 ms.
> 518429 2023-04-17 14:17:55,492 INFO {code}
> FanOutOneBlockAsyncDFSOutputHelper.createOutput will connect and 
> trySaslNegotiate to dataNode. In Sasl authentication mode, 
> SaslNegotiateHandler will be used to handle authentication. If datanode is 
> shut down, SaslNegotiateHandler.channelInactive do not  call back to promise 
> and cause future to be stuck forever.
> {code:java}
> @Override
> public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
>   ctx.write(ctx.alloc().buffer(4).writeInt(SASL_TRANSFER_MAGIC_NUMBER));
>   sendSaslMessage(ctx, new byte[0]);
>   ctx.flush();
>   step++;
> }
> @Override
> public void channelInactive(ChannelHandlerContext ctx) throws Exception {
>   saslClient.dispose();
> } {code}
> So SaslNegotiateHandler.channelInactive should call promise.tryFailure to 
> avoid future stuck forever.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to