[
https://issues.apache.org/jira/browse/HDDS-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-10932:
-------------------------------
Attachment: idea_1.patch
> Reduce number of watch requests by using CommitInfoProto from
> NotReplicatedException
> -------------------------------------------------------------------------------------
>
> Key: HDDS-10932
> URL: https://issues.apache.org/jira/browse/HDDS-10932
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Client
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Attachments: idea_1.patch
>
>
> See RATIS-2089 for the context.
> {quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch
> commits request with different ReplicationLevel
> # Watch for ALL_COMMITTED
> # Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
> Based on the second watch request, the client will remove some failed
> datanode UUID from the commitInfoMap.
> The second watch might not be necessary since the entries in
> AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has
> been committed to the majority of the servers. Therefore, another
> MAJORITY_COMMITTED watch might not be necessary. From my understanding, the
> second MAJORITY_COMMITTED only serves to gain information to remove entries
> from commitInfoMap.
> If the first watch failed with NotReplicatedException, we might be able to
> remove the need to a second watch request. Since NotReplicatedException is a
> Raft server exception, we can include the CommitInfoProtos in the
> NotReplicatedException. The client can use this CommitInfoProtos to remove
> the entry from commitInfoMap without sending another WATCH request.
> {quote}
> We can CommitInfoProto in NotReplicatedException introduced in RATIS-2089 to
> remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException
> is thrown from the DN Ratis leader.
> This also requires DN Ratis server watch timeout configuration change
> hdds.ratis.raft.server.watch.timeout to be lower than the client watch
> timeout hdds.ratis.raft.client.rpc.watch.request.timeout so that
> NotReplicatedException will be thrown instead of TimeoutException.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]