[
https://issues.apache.org/jira/browse/HDDS-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-10932:
-------------------------------
Description:
See RATIS-2089 for the context.
{quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch
commits request with different ReplicationLevel
# Watch for ALL_COMMITTED
# Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
Based on the second watch request, the client will remove some failed datanode
UUID from the commitInfoMap.
The second watch might not be necessary since the entries in
AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been
committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED
watch might not be necessary. From my understanding, the second
MAJORITY_COMMITTED only serves to gain information to remove entries from
commitInfoMap.
If the first watch failed with NotReplicatedException, we might be able to
remove the need to a second watch request. Since NotReplicatedException is a
Raft server exception, we can include the CommitInfoProtos in the
NotReplicatedException. The client can use this CommitInfoProtos to remove the
entry from commitInfoMap without sending another WATCH request.
{quote}
We can use CommitInfoProto in NotReplicatedException introduced in RATIS-2089
to remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException
is thrown from the DN Ratis leader.
This also requires DN Ratis server watch timeout configuration change
hdds.ratis.raft.server.watch.timeout to be lower than the client watch timeout
hdds.ratis.raft.client.rpc.watch.request.timeout so that NotReplicatedException
will be thrown instead of TimeoutException.
was:
See RATIS-2089 for the context.
{quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch
commits request with different ReplicationLevel
# Watch for ALL_COMMITTED
# Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
Based on the second watch request, the client will remove some failed datanode
UUID from the commitInfoMap.
The second watch might not be necessary since the entries in
AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been
committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED
watch might not be necessary. From my understanding, the second
MAJORITY_COMMITTED only serves to gain information to remove entries from
commitInfoMap.
If the first watch failed with NotReplicatedException, we might be able to
remove the need to a second watch request. Since NotReplicatedException is a
Raft server exception, we can include the CommitInfoProtos in the
NotReplicatedException. The client can use this CommitInfoProtos to remove the
entry from commitInfoMap without sending another WATCH request.
{quote}
We can CommitInfoProto in NotReplicatedException introduced in RATIS-2089 to
remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException is
thrown from the DN Ratis leader.
This also requires DN Ratis server watch timeout configuration change
hdds.ratis.raft.server.watch.timeout to be lower than the client watch timeout
hdds.ratis.raft.client.rpc.watch.request.timeout so that NotReplicatedException
will be thrown instead of TimeoutException.
> Reduce number of watch requests by using CommitInfoProto from
> NotReplicatedException
> -------------------------------------------------------------------------------------
>
> Key: HDDS-10932
> URL: https://issues.apache.org/jira/browse/HDDS-10932
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Client
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
> Attachments: idea_1.patch
>
>
> See RATIS-2089 for the context.
> {quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch
> commits request with different ReplicationLevel
> # Watch for ALL_COMMITTED
> # Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
> Based on the second watch request, the client will remove some failed
> datanode UUID from the commitInfoMap.
> The second watch might not be necessary since the entries in
> AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has
> been committed to the majority of the servers. Therefore, another
> MAJORITY_COMMITTED watch might not be necessary. From my understanding, the
> second MAJORITY_COMMITTED only serves to gain information to remove entries
> from commitInfoMap.
> If the first watch failed with NotReplicatedException, we might be able to
> remove the need to a second watch request. Since NotReplicatedException is a
> Raft server exception, we can include the CommitInfoProtos in the
> NotReplicatedException. The client can use this CommitInfoProtos to remove
> the entry from commitInfoMap without sending another WATCH request.
> {quote}
> We can use CommitInfoProto in NotReplicatedException introduced in RATIS-2089
> to remove the need for watch MAJORITY_COMMITTED calls if
> NotReplicatedException is thrown from the DN Ratis leader.
> This also requires DN Ratis server watch timeout configuration change
> hdds.ratis.raft.server.watch.timeout to be lower than the client watch
> timeout hdds.ratis.raft.client.rpc.watch.request.timeout so that
> NotReplicatedException will be thrown instead of TimeoutException.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]