[ 
https://issues.apache.org/jira/browse/HDDS-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-10932:
-------------------------------
    Description: 
See RATIS-2089 for the context.
{quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch 
commits request with different ReplicationLevel
 # Watch for ALL_COMMITTED 
 # Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)

Based on the second watch request, the client will remove some failed datanode 
UUID from the commitInfoMap.

The second watch might not be necessary since the entries in 
AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been 
committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED 
watch might not be necessary. From my understanding, the second 
MAJORITY_COMMITTED only serves to gain information to remove entries from 
commitInfoMap.

If the first watch failed with NotReplicatedException, we might be able to 
remove the need to a second watch request. Since NotReplicatedException is a 
Raft server exception, we can include the CommitInfoProtos in the 
NotReplicatedException. The client can use this CommitInfoProtos to remove the 
entry from commitInfoMap without sending another WATCH request. 
{quote}
We can use CommitInfoProto in NotReplicatedException introduced in RATIS-2089 
to remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException 
is thrown from the DN Ratis leader.

This also requires DN Ratis server watch timeout configuration change 
hdds.ratis.raft.server.watch.timeout to be lower than the client watch timeout 
hdds.ratis.raft.client.rpc.watch.request.timeout so that NotReplicatedException 
will be thrown instead of TimeoutException.

  was:
See RATIS-2089 for the context.
{quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch 
commits request with different ReplicationLevel
 # Watch for ALL_COMMITTED 
 # Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)

Based on the second watch request, the client will remove some failed datanode 
UUID from the commitInfoMap.

The second watch might not be necessary since the entries in 
AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been 
committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED 
watch might not be necessary. From my understanding, the second 
MAJORITY_COMMITTED only serves to gain information to remove entries from 
commitInfoMap.

If the first watch failed with NotReplicatedException, we might be able to 
remove the need to a second watch request. Since NotReplicatedException is a 
Raft server exception, we can include the CommitInfoProtos in the 
NotReplicatedException. The client can use this CommitInfoProtos to remove the 
entry from commitInfoMap without sending another WATCH request. 
{quote}
We can CommitInfoProto in NotReplicatedException introduced in RATIS-2089 to 
remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException is 
thrown from the DN Ratis leader.

This also requires DN Ratis server watch timeout configuration change 
hdds.ratis.raft.server.watch.timeout to be lower than the client watch timeout 
hdds.ratis.raft.client.rpc.watch.request.timeout so that NotReplicatedException 
will be thrown instead of TimeoutException.


> Reduce number of watch requests by using CommitInfoProto from 
> NotReplicatedException 
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-10932
>                 URL: https://issues.apache.org/jira/browse/HDDS-10932
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Client
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: idea_1.patch
>
>
> See RATIS-2089 for the context.
> {quote}In Ozone's XceiverClientRatis#watchForCommit, there are two watch 
> commits request with different ReplicationLevel
>  # Watch for ALL_COMMITTED 
>  # Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
> Based on the second watch request, the client will remove some failed 
> datanode UUID from the commitInfoMap.
> The second watch might not be necessary since the entries in 
> AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has 
> been committed to the majority of the servers. Therefore, another 
> MAJORITY_COMMITTED watch might not be necessary. From my understanding, the 
> second MAJORITY_COMMITTED only serves to gain information to remove entries 
> from commitInfoMap.
> If the first watch failed with NotReplicatedException, we might be able to 
> remove the need to a second watch request. Since NotReplicatedException is a 
> Raft server exception, we can include the CommitInfoProtos in the 
> NotReplicatedException. The client can use this CommitInfoProtos to remove 
> the entry from commitInfoMap without sending another WATCH request. 
> {quote}
> We can use CommitInfoProto in NotReplicatedException introduced in RATIS-2089 
> to remove the need for watch MAJORITY_COMMITTED calls if 
> NotReplicatedException is thrown from the DN Ratis leader.
> This also requires DN Ratis server watch timeout configuration change 
> hdds.ratis.raft.server.watch.timeout to be lower than the client watch 
> timeout hdds.ratis.raft.client.rpc.watch.request.timeout so that 
> NotReplicatedException will be thrown instead of TimeoutException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to