[ 
https://issues.apache.org/jira/browse/RATIS-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792320#comment-17792320
 ] 

Ivan Andika commented on RATIS-1947:
------------------------------------

> Did it close the server division?

Yes, the server division was closed. When I traced the logic, it originates 
from RaftServerProxy#groupRemoveAsync. Was it because the Ozone 
ContainerStateMachine closes the Ratis pipeline using the Group Management Api 
when it handled notifyLogFailed? Or does Ratis automatically removes the group 
if there is a log failure?
{code:java}
2023-11-29 14:05:15,729 [Command processor thread] INFO 
org.apache.ratis.server.RaftServer$Division: 
290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E: shutdown
2023-11-29 14:05:15,729 [Command processor thread] INFO 
org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with 
object name 
Ratis:service=RaftServer,group=group-B9498E0AD07E,id=290b61a8-4094-43d0-a6bd-9bccb142d26f
2023-11-29 14:05:15,729 [Command processor thread] INFO 
org.apache.ratis.server.impl.RoleInfo: 290b61a8-4094-43d0-a6bd-9bccb142d26f: 
shutdown 290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E-LeaderStateImpl
2023-11-29 14:05:15,730 [Command processor thread] INFO 
org.apache.ratis.server.impl.PendingRequests: 
290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E-PendingRequests: 
sendNotLeaderResponses
2023-11-29 14:05:15,730 
[290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E->1669a7e6-fe3c-4f7e-8fcb-ec5d5027b0eb-GrpcLogAppender-LogAppenderDaemon]
 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E->1669a7e6-fe3c-4f7e-8fcb-ec5d5027b0eb-GrpcLogAppender:
 Wait interrupted by java.lang.InterruptedException
2023-11-29 14:05:15,730 
[290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E->74012c6e-d9de-4e9d-811a-05efe6a1e47d-GrpcLogAppender-LogAppenderDaemon]
 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E->74012c6e-d9de-4e9d-811a-05efe6a1e47d-GrpcLogAppender:
 Wait interrupted by java.lang.InterruptedException {code}
I noticed was there were a lot of GroupMismatchException afterwards, which from 
my understanding should be because the division has been closed for the group.
{code:java}
2023-11-29 14:05:08,751 [grpc-default-executor-8201] WARN 
org.apache.ratis.grpc.server.GrpcLogAppender: 
290b61a8-4094-43d0-a6bd-9bccb142d26f@group-B9498E0AD07E->74012c6e-d9de-4e9d-811a-05efe6a1e47d-AppendLogResponseHandler:
 Failed appendEntries
org.apache.ratis.protocol.exceptions.GroupMismatchException: 
74012c6e-d9de-4e9d-811a-05efe6a1e47d: group-B9498E0AD07E not found. {code}
Another thing was that since the pipeline close action did not trigger the 
heartbeat straightaway, each datanode in the pipeline might close at completely 
different time. From my understanding of Raft group remove, it is a local 
operation and does not propagate to the other Raft peers in the group. This 
might probably cause the Raft server to still believe that it's still in an 
active Raft group and might still send AppendEntries / RequestVote. Will this 
be a problem?
{code:java}
# dn1
2023-11-29 15:22:59,477 [Command processor thread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
 Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on 
datanode 1669a7e6-fe3c-4f7e-8fcb-ec5d5027b0eb.

# dn5
2023-11-29 14:07:55,442 [Command processor thread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
 Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on 
datanode bd1e72ab-cfd5-4cc1-8fbf-6ec9d9654c98.

# dn8 
2023-11-29 16:57:53,894 [Command processor thread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler:
 Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on 
datanode 4a23d1e8-d526-4a4d-8ed1-13ffbab3a5cc. {code}
> Currently, it won't recover from RaftLog error.

I see. 

> raft.server.log.statemachine.data.sync  (default 10s)

> so the get(..) method will wait longer.

Thank you for the info. I will try increasing the sync timeout. 

Is it also advisable to increase the retry? 
(raft.server.log.statemachine.data.sync.timeout.retry)

Also I noticed that setSyncTimeoutRetry was called twice in 
XceiverServerRatis#newRaftProperties. First with value nodeFailureTimeoutMs 
/dataSyncTimeout.toIntExact(TimeUnit.MILLISECONDS) and its overridden with the 
value from the Ozone configuration (default -1). This should mean that the 
WriteLog was retried indefinitely? 

 

Thank you for all the help.

> TimeoutIOException in WriteLog might not release Pending Requests
> -----------------------------------------------------------------
>
>                 Key: RATIS-1947
>                 URL: https://issues.apache.org/jira/browse/RATIS-1947
>             Project: Ratis
>          Issue Type: Bug
>          Components: Leader
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: image-2023-12-01-12-04-24-409.png, 
> image-2023-12-01-13-28-24-125.png
>
>
> Recently there are some I/O issues in our Ozone cluster's datanodes that 
> causes WriteLog to time out with this exceptions
> {code:java}
> 2023-11-27 18:05:36,593 
> [dac3eec1-7886-4278-b0db-d6b62a046d44@group-2BEC613EA29B-SegmentedRaftLogWorker]
>  WARN org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> Timeout 0/~
> org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10s: 
> WriteLog:337065866: (t:7, i:337065866), STATEMACHINELOGENTRY, cmdType: 
> WriteChunk traceID: "" containerID: 3470188 datanodeUuid: 
> "18b90e19-6a7d-4858-a907-4acdf04a3949" pipelineID: 
> "9de85e00-968d-4b32-a589-2bec613ea29b" writeChunk { blockID { containerID: 
> 3470188 localID: 107544269209449942 blockCommitSequenceId: 337065456 } 
> chunkData { chunkName: "107544269209449942_chunk_9" offset: 33554432 len: 
> 4194304 checksumData { type: CRC32 bytesPerChecksum: 1048576 checksums: 
> "\224\316\216\327" checksums: "\326\223/\201" checksums: "O\3374\336" 
> checksums: "\313\231Bf" } } }, container 
> path=/mnt/disk/3/ozone/hdds/CID-354bf147-15b8-4d31-9ff0-c0bce64451cf/current/containerDir121/3470188-writeStateMachineData
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:89)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StateMachineDataPolicy.getFromFuture(SegmentedRaftLogWorker.java:89)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:561)
>         at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:321)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>         at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:80)
>         ... 4 more {code}
> This translate to state machine data write timeout shown in the metrics 
> dashboard
> !image-2023-12-01-13-28-24-125.png!
> After these exceptions, it's seen that the Pending Requests in leader queue 
> is stuck and unable to return to normal, even after the I/O performance 
> returns to normal
> !image-2023-12-01-12-04-24-409.png!
> While the leader pending request limit hits did not increase, which means 
> that no client requests failed with ResourceUnavailableException (but there 
> were previous instances when these limits are hit). It seems that the client 
> operations are blocked. 
> My expectations are that once I/O performance has returned, leader's pending 
> requests should have been removed, which also release the RequestLimits 
> semaphore. Looking at the LeaderStateImpl there are some pending 
> requests-related methods:
>  * Acquiring pending requests: called in RaftServerImpl#appendTransaction 
> after StateMachine (e.g. ContainerStateMachine) finish building the 
> TransactionContext in StateMachine#startTransaction
>  ** tryAcquirePendingRequest: Acquiring the permit based on the current size 
> and number of pending requests
>  *** If the pending requests size and limits are hit, 
> ResourceUnavailableException should be thrown
>  ** addPendingRequest: Using the permit, add the client request to the 
> pending requests
>  * Releasing pending requests: called when applying the log to state machine 
> (means the transaction / log entry has been successfully committed)
>  ** replyPendingRequest: Will release pending request element and size  
> Looking at the codes for a bit, I suspected that TimeoutIOException of 
> WriteLog did not release the leader's pending requests which seems to cause 
> some client operations to be blocked. However, I'm not entirely sure the 
> expected behavior when TimeoutIOException is thrown (will it be retried 
> indefinitely?). 
> Our Ozone version is still using the Ratis version based on 2.4.1 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to