[
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453040#comment-17453040
]
István Fajth commented on HDDS-5954:
------------------------------------
After experimenting with how gRPC works, it seems that there is a possibility
to solve the problem of ordering requests properly.
I got to the following conclusions:
- if we reuse the StreamObserver on the client side, then there is a possible 2
order of magnitude speedup. See below the test details.
- if we reuse the StreamObserver, and we use just one StreamObserver pair per
client thread, and the server side is processing the requests inside the onNext
method or even if it delegates the request processing to a thread, it delegates
to the same thread and processes in a single threaded fashion per client
thread, then the requests are served in the order as they were posted by the
client thread.
In the EC case, we have additional synchronization after a stripe was written,
as we need to ensure that all the data and parity chunks for a given stripe was
written properly, and just after that we call the putBlock, it would be enough
for us to enable async writes, but as the aim here is to address at least
relevant TODOs in the XCeiverClientGRPC class, we should go for it, and enable
the client side operations to be async. (Still this is just writeChunk, and
putBlock, as other client calls are sent via the synchronizing sendCommand
method.
Evaluating the server side shows the following state there:
- XceiverServerGrpc is the class that initializes and handles the lifecycle of
the XCeiverClientGRPC's server side.
- GrpcXceiverService is the server side implementation of the
XCeiverClientProtocolService, which handles the messages defined in
DatanodeClientProtocol.ContainerCommandRequestProto.
- GrpcXceiverService is instantiated with the HddsDispatcher instance that the
XceiverServerGrpc uses, and also uses the dispatcher's dispatch method to
process a request, just as the XceiverServerGrpc's submitRequest method
- the HddsDispacher's dispatch method does not seem to utilize any background
thread to process the request, and calculates then provides the response in the
same thread from which it has been called.
Based on these we can say that the server side processes the requests from the
same observer in the same thread ensuring that the requests are served in the
same order as they were sent. (NOTE: ordering is ensured only for the requests
coming in via the same StreamObserver pair.)
The experiment I measured the throughput with:
1. experiment: client side creates 32 StreamObserver pair and 32 threads, and
sends 10k requests over to the server per thread. Requests are simple create
container requests. Server responds with a success create container response
created and returned. Measured is the time to get the observer from an
ArrayList, and send the request with the onNext method.
2. experiment: client side creates 32 threads, and sends 10k requests over to
the server per thread. Requests are simple create container requests. Server
responds with a success create container response created and returned.
Measured is the time to create the StreamObserver pair agains the service, and
sending the request.
None of the experiments counted the time required for the response to get back
to the client side, just the time required to send the request.
Runtime for #1 was ~200ms, while runtime for #2 was ~9 seconds, so there is a
huge gain if we reuse the StreamObservers already created towards a DataNode.
If I first create stream pairs, and then wait 2 minutes before sending any data
over the stream pairs, then sending the requests takes a significantly longer
time, even longer then the initial setup time of the channels, stubs, and
streamobserver pairs. So it is important to look into how we can avoid this
penalty, and why it is happening.
Finally here is the general idea I think we should implement in master:
Add a map of DataNodeUUID to StreamObserver mapping in the XCeiverClientGrpc
class, and with that every client can have their own StreamObserver to reuse to
send requests to the server. This would be the bare minimum we can do, and with
that we can let the writeChunk and putBlock requests (basically all write
requests in the grpc client) to go async. Even if a client is used from
multiple threads, the per thread ordering would be kept, and we do not support
to write to the same key from multiple threads now.
Optionally we can explore the idea of sharing the channel/stub/StreamObserver
between clients, though in this case we should also reevaluate how we are
caching the clients themselves, and if we should cache the grpc
channel/stub/StreamObserver in an other cache and use these new caches in the
client... So this goes a bit further, and it falls out of ECs scope, though it
is an interesting area for optimizations on the read side of Ozone in general.
> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
> Key: HDDS-5954
> URL: https://issues.apache.org/jira/browse/HDDS-5954
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Uma Maheswara Rao G
> Assignee: István Fajth
> Priority: Major
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not
> async. The code on the other hand seems to be returning a CompletableFuture
> instance wrapped inside an XceiverClientReply, though sometimes we wait on
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how
> we can approach fixing it, and to figure out whether we really need to fix it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]