[
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455050#comment-17455050
]
István Fajth commented on HDDS-5954:
------------------------------------
Thank you for your time over zoom [~szetszwo], let me summarize quickly our
conclusions.
In Ratis we have a gRPC client implementation, that can handle any kind of
payload, and using a sliding window algorithm, it solves the problem of
ordering the unordered async replies. This client we can use in Ozone, maybe
some changes are required to open up the API from Ratis side, but it is
certainly something that we can do, and for this the server side requires
minimal to no changes.
In the gRPC client however there is an ordering guarantee via the same
StreamObserver, if the server side processes the requests synchronously, the
responses will arrive in the order as the requests were sent, and this is the
case currently with the DataNode's Xceiver client service.
In the EC case, we have an additional synchronization layer, which happens
between writeChunks and putBlocks, because we need to ensure that all the
stripes have been written before we commit the block information, as if there
is a failure, our design requires to re-write the whole stripe to a different
set of DataNodes. This synchronization makes it safe for the EC case to just
imply ignore whether the responses are coming in-order or out-of-order as we
wait anyways.
Based on this we agreed that the easiest way to fix the current gRPC client and
enable async response processing is to reuse the StreamObserver pair towards a
DataNode. The DataNode will process the requests in the dispatcher
synchronously, and with that the responses will arrive in order, but we can
send further requests while we are waiting on the responses.
In the general case we should be also fine with just gRPC ordering guarantees
as the requests are processed on the DN side in the dispatcher on the thread
where they arrived.
So we are good with the current implementation in Ozone, however we both agreed
that it would be beneficial to switch to just use the Ratis implementation of
gRPC client, as that means we have just one codebase to maintain, and we can
reuse the features that are ready in Ratis already, like the possibility of
ordered fully async processing which is async on both the client and server
side. As we agreed, I will create a separate JIRA for this improvement, that we
can take on later.
> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
> Key: HDDS-5954
> URL: https://issues.apache.org/jira/browse/HDDS-5954
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Uma Maheswara Rao G
> Assignee: István Fajth
> Priority: Major
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not
> async. The code on the other hand seems to be returning a CompletableFuture
> instance wrapped inside an XceiverClientReply, though sometimes we wait on
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how
> we can approach fixing it, and to figure out whether we really need to fix it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]