[ 
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455050#comment-17455050
 ] 

István Fajth commented on HDDS-5954:
------------------------------------

Thank you for your time over zoom [~szetszwo], let me summarize quickly our 
conclusions.

In Ratis we have a gRPC client implementation, that can handle any kind of 
payload, and using a sliding window algorithm, it solves the problem of 
ordering the unordered async replies. This client we can use in Ozone, maybe 
some changes are required to open up the API from Ratis side, but it is 
certainly something that we can do, and for this the server side requires 
minimal to no changes.

In the gRPC client however there is an ordering guarantee via the same 
StreamObserver, if the server side processes the requests synchronously, the 
responses will arrive in the order as the requests were sent, and this is the 
case currently with the DataNode's Xceiver client service.

In the EC case, we have an additional synchronization layer, which happens 
between writeChunks and putBlocks, because we need to ensure that all the 
stripes have been written before we commit the block information, as if there 
is a failure, our design requires to re-write the whole stripe to a different 
set of DataNodes. This synchronization makes it safe for the EC case to just 
imply ignore whether the responses are coming in-order or out-of-order as we 
wait anyways.

Based on this we agreed that the easiest way to fix the current gRPC client and 
enable async response processing is to reuse the StreamObserver pair towards a 
DataNode. The DataNode will process the requests in the dispatcher 
synchronously, and with that the responses will arrive in order, but we can 
send further requests while we are waiting on the responses.
In the general case we should be also fine with just gRPC ordering guarantees 
as the requests are processed on the DN side in the dispatcher on the thread 
where they arrived.

So we are good with the current implementation in Ozone, however we both agreed 
that it would be beneficial to switch to just use the Ratis implementation of 
gRPC client, as that means we have just one codebase to maintain, and we can 
reuse the features that are ready in Ratis already, like the possibility of  
ordered fully async processing which is async on both the client and server 
side. As we agreed, I will create a separate JIRA for this improvement, that we 
can take on later.

> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
>                 Key: HDDS-5954
>                 URL: https://issues.apache.org/jira/browse/HDDS-5954
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Uma Maheswara Rao G
>            Assignee: István Fajth
>            Priority: Major
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can 
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not 
> async. The code on the other hand seems to be returning a CompletableFuture 
> instance wrapped inside an XceiverClientReply, though sometimes we wait on 
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be 
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream 
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how 
> we can approach fixing it, and to figure out whether we really need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to