[ 
https://issues.apache.org/jira/browse/RATIS-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813468#comment-17813468
 ] 

Ivan Andika commented on RATIS-1947:
------------------------------------

Just an update: we managed to resolve the TimeoutIOException which was caused 
by the du commands in the datanode volume usage job that's causing a huge inode 
cache. This huge inode cache causes a lot of direct memory reclaim which blocks 
the chunk write, causing the TimeoutIOException. We have since migrated to more 
lightweight DedicatedDiskSpaceUsage to prevent this.

As for the stuck pending request, one observation is that this only happens on 
Raft peer of a closed pipeline where it has not received the group remove 
command (similar to the case described in HDDS-9959). Probably because the 
other Raft followers have been closed and the pending requests cannot be 
completed since the transaction log entry has never been applied to the 
leader's state machine. However, I was not able to find the logs to support my 
hypothesis (e.g. stuck leader election on the unclosed leader) during the stuck 
pending request so this is just a guess. Maybe someone can help to explain more.

I'm closing this for now, and will reopen this if the problem arises again.

> TimeoutIOException in WriteLog might not release Pending Requests
> -----------------------------------------------------------------
>
>                 Key: RATIS-1947
>                 URL: https://issues.apache.org/jira/browse/RATIS-1947
>             Project: Ratis
>          Issue Type: Bug
>          Components: Leader
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: image-2023-12-01-12-04-24-409.png, 
> image-2023-12-01-13-28-24-125.png
>
>
> Recently there are some I/O issues in our Ozone cluster's datanodes that 
> causes WriteLog to time out with this exceptions
> {code:java}
> 2023-11-27 18:05:36,593 
> [dac3eec1-7886-4278-b0db-d6b62a046d44@group-2BEC613EA29B-SegmentedRaftLogWorker]
>  WARN org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> Timeout 0/~
> org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10s: 
> WriteLog:337065866: (t:7, i:337065866), STATEMACHINELOGENTRY, cmdType: 
> WriteChunk traceID: "" containerID: 3470188 datanodeUuid: 
> "18b90e19-6a7d-4858-a907-4acdf04a3949" pipelineID: 
> "9de85e00-968d-4b32-a589-2bec613ea29b" writeChunk { blockID { containerID: 
> 3470188 localID: 107544269209449942 blockCommitSequenceId: 337065456 } 
> chunkData { chunkName: "107544269209449942_chunk_9" offset: 33554432 len: 
> 4194304 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: 
> "\224\316\216\327" checksums: "\326\223/\201" checksums: "O\3374\336" 
> checksums: "\313\231Bf" } } }, container 
> path=/mnt/disk/3/ozone/hdds/CID-354bf147-15b8-4d31-9ff0-c0bce64451cf/current/containerDir121/3470188-writeStateMachineData
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:89)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StateMachineDataPolicy.getFromFuture(SegmentedRaftLogWorker.java:89)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:561)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:321)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:80)
>         ... 4 more {code}
> This translate to state machine data write timeout shown in the metrics 
> dashboard
> !image-2023-12-01-13-28-24-125.png!
> After these exceptions, it's seen that the Pending Requests in leader queue 
> is stuck and unable to return to normal, even after the I/O performance 
> returns to normal
> !image-2023-12-01-12-04-24-409.png!
> While the leader pending request limit hits did not increase, which means 
> that no client requests failed with ResourceUnavailableException (but there 
> were previous instances when these limits are hit). It seems that the client 
> operations are blocked. 
> My expectations are that once I/O performance has returned, leader's pending 
> requests should have been removed, which also release the RequestLimits 
> semaphore. Looking at the LeaderStateImpl there are some pending 
> requests-related methods:
>  * Acquiring pending requests: called in RaftServerImpl#appendTransaction 
> after StateMachine (e.g. ContainerStateMachine) finish building the 
> TransactionContext in StateMachine#startTransaction
>  ** tryAcquirePendingRequest: Acquiring the permit based on the current size 
> and number of pending requests
>  *** If the pending requests size and limits are hit, 
> ResourceUnavailableException should be thrown
>  ** addPendingRequest: Using the permit, add the client request to the 
> pending requests
>  * Releasing pending requests: called when applying the log to state machine 
> (means the transaction / log entry has been successfully committed)
>  ** replyPendingRequest: Will release pending request element and size  
> Looking at the codes for a bit, I suspected that TimeoutIOException of 
> WriteLog did not release the leader's pending requests which seems to cause 
> some client operations to be blocked. However, I'm not entirely sure the 
> expected behavior when TimeoutIOException is thrown (will it be retried 
> indefinitely?). 
> Our Ozone version is still using the Ratis version based on 2.4.1 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to