[
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481085#comment-17481085
]
István Fajth commented on HDDS-5954:
------------------------------------
To summarize further experiments on this, I have created a markdown document
about how the gRPC client works in Ozone.
The main points and consequences:
- all calls are made synchronously today, and there is traces to only make the
WriteChunk and PutBlock asynchronous
- WriteChunk and PutBlock is not called asynchronously, as ordering between
them is not solved otherwise, so we chose to wait for the result of these calls
as well.
- EC will have external synchronization points, so it can afford to have
WriteChunk and PutBlock to be called really asynchronously
- Standalone client is not used for write except in tests, and if a client asks
for the standalone client directly. (Note that RandomKeyGenerator in freon uses
the standalone client for writes by default.
- Caching the StreamObservers introduces a new problem, the assignment of
results to completable futures in the client reply is not direct but indirectly
depends on the ordering internal between the stream pair within
StreamObservers, which is speculative, moreover, using the same StreamObserver
synchronizes the request within one stream pair, so even though we get ordering
we can not get async calls with caching the StreamObservers.
- Even though creating the StreamObserver pairs is costly compared to almost 0
cost processing, it does not have too much gain when the processing of the
requests takes time as well.
Because of all this, and because of the fact that the standalone client is not
deprecated, and can be used from CLI, we should not modify the
XceiverClientGrpc class directly, but we can specialize it for EC.
Also an other task we should have done earlier, as RATIS replication is the
default, we should have turn our tests to use RATIS replication, and we should
deprecate the Standalone client to use for writes, and state that specifically.
In order to get to this, I am cancelling the PR for this JIRA, and moving it
further to under the new tickets, and I am closing this ticket as well.
I have created the following JIRAs to track this effort further:
HDDS-6217 - Cleanup XceiverClientGrpc TODOs, and document how the client works
and should be used.
HDDS-6218 - Deprecate the standalone client for writes
HDDS-6219 - Switch to RATIS ReplicationType from STAND_ALONE in our tests
HDDS-6220 - EC: Introduce a gRPC client implementation for EC with really async
WriteChunk and PutBlock (on EC branch)
> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
> Key: HDDS-5954
> URL: https://issues.apache.org/jira/browse/HDDS-5954
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Uma Maheswara Rao G
> Assignee: István Fajth
> Priority: Major
> Labels: pull-request-available
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not
> async. The code on the other hand seems to be returning a CompletableFuture
> instance wrapped inside an XceiverClientReply, though sometimes we wait on
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how
> we can approach fixing it, and to figure out whether we really need to fix it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]