[ 
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453040#comment-17453040
 ] 

István Fajth commented on HDDS-5954:
------------------------------------

After experimenting with how gRPC works, it seems that there is a possibility 
to solve the problem of ordering requests properly.

I got to the following conclusions:
- if we reuse the StreamObserver on the client side, then there is a possible 2 
order of magnitude speedup. See below the test details.
- if we reuse the StreamObserver, and we use just one StreamObserver pair per 
client thread, and the server side is processing the requests inside the onNext 
method or even if it delegates the request processing to a thread, it delegates 
to the same thread and processes in a single threaded fashion per client 
thread, then the requests are served in the order as they were posted by the 
client thread.

In the EC case, we have additional synchronization after a stripe was written, 
as we need to ensure that all the data and parity chunks for a given stripe was 
written properly, and just after that we call the putBlock, it would be enough 
for us to enable async writes, but as the aim here is to address at least 
relevant TODOs in the XCeiverClientGRPC class, we should go for it, and enable 
the client side operations to be async. (Still this is just writeChunk, and 
putBlock, as other client calls are sent via the synchronizing sendCommand 
method.

Evaluating the server side shows the following state there:
- XceiverServerGrpc is the class that initializes and handles the lifecycle of 
the XCeiverClientGRPC's server side.
- GrpcXceiverService is the server side implementation of the 
XCeiverClientProtocolService, which handles the messages defined in 
DatanodeClientProtocol.ContainerCommandRequestProto.
- GrpcXceiverService is instantiated with the HddsDispatcher instance that the 
XceiverServerGrpc uses, and also uses the dispatcher's dispatch method to 
process a request, just as the XceiverServerGrpc's submitRequest method
- the HddsDispacher's dispatch method does not seem to utilize any background 
thread to process the request, and calculates then provides the response in the 
same thread from which it has been called.

Based on these we can say that the server side processes the requests from the 
same observer in the same thread ensuring that the requests are served in the 
same order as they were sent. (NOTE: ordering is ensured only for the requests 
coming in via the same StreamObserver pair.)

The experiment I measured the throughput with:
1. experiment: client side creates 32 StreamObserver pair and 32 threads, and 
sends 10k requests over to the server per thread. Requests are simple create 
container requests. Server responds with a success create container response 
created and returned. Measured is the time to get the observer from an 
ArrayList, and send the request with the onNext method.
2. experiment: client side creates 32 threads, and sends 10k requests over to 
the server per thread. Requests are simple create container requests. Server 
responds with a success create container response created and returned.  
Measured is the time to create the StreamObserver pair agains the service, and 
sending the request.
None of the experiments counted the time required for the response to get back 
to the client side, just the time required to send the request.
Runtime for #1 was ~200ms, while runtime for #2 was ~9 seconds, so there is a 
huge gain if we reuse the StreamObservers already created towards a DataNode.
If I first create stream pairs, and then wait 2 minutes before sending any data 
over the stream pairs, then sending the requests takes a significantly longer 
time, even longer then the initial setup time of the channels, stubs, and 
streamobserver pairs. So it is important to look into how we can avoid this 
penalty, and why it is happening.


Finally here is the general idea I think we should implement in master:
Add a map of DataNodeUUID to StreamObserver mapping in the XCeiverClientGrpc 
class, and with that every client can have their own StreamObserver to reuse to 
send requests to the server. This would be the bare minimum we can do, and with 
that we can let the writeChunk and putBlock requests (basically all write 
requests in the grpc client) to go async. Even if a client is used from 
multiple threads, the per thread ordering would be kept, and we do not support 
to write to the same key from multiple threads now.

Optionally we can explore the idea of sharing the channel/stub/StreamObserver 
between clients, though in this case we should also reevaluate how we are 
caching the clients themselves, and if we should cache the grpc 
channel/stub/StreamObserver in an other cache and use these new caches in the 
client... So this goes a bit further, and it falls out of ECs scope, though it 
is an interesting area for optimizations on the read side of Ozone in general.



> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
>                 Key: HDDS-5954
>                 URL: https://issues.apache.org/jira/browse/HDDS-5954
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Uma Maheswara Rao G
>            Assignee: István Fajth
>            Priority: Major
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can 
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not 
> async. The code on the other hand seems to be returning a CompletableFuture 
> instance wrapped inside an XceiverClientReply, though sometimes we wait on 
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be 
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream 
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how 
> we can approach fixing it, and to figure out whether we really need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to