[
https://issues.apache.org/jira/browse/HDDS-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451849#comment-17451849
]
István Fajth commented on HDDS-5954:
------------------------------------
As we discussed over a zoom session yesterday with [~umamaheswararao] and
[~sodonnell] the following is the case with EC in regards to these TODOs:
#1 is irrelevant for us, we can leave it as it is
#2-4: is something we want to address partially.
Based on a discussion with [~shashikant], the note in #3 is a general problem,
and can cause trouble in case the putBlock request is followed by a related
writeChunk request in time, because the putBlock will (if it succeeds) persist
the length of a block in the DN's related rocksDB schema, and if a writeChunk
is served out of order later on, and fails, then we might have inconsistency in
between data and metadata. So for a general solution ordering of these requests
have to be solved before we enable these to be async.
Other write requests on the container protocol can be async, they are
synchronized anyway in the ContainerProtocolCalls class that we use to issue
these requests from all other client code.
During the EC case discussion we have aggreed in the followings:
In case of an EC write, we do a syncronization before and after the putBlock
request, which happens when parity blocks were already written. This is
necessary because if any of the writes in the given stripe fails, we re-acquire
a new block group, and rewrite the stripe data, as this takes off the burden of
recovering data after sporadic stripe write failures.
This synchronization during EC writes allows us to enable async writeChunk and
putBlock calls in the client for at least the writes that happens to EC
pipelines, and for that we can easily check within the sendCommandAsync call.
We agreed that we might benefit of solving #4, and reuse stream observers,
though this topic might need some more background check that I will do, and if
thiis is a bigger work item, I will create a new JIRA for it later on.
> EC: Review the TODOs in GRPC Xceiver client and fix them.
> ---------------------------------------------------------
>
> Key: HDDS-5954
> URL: https://issues.apache.org/jira/browse/HDDS-5954
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Uma Maheswara Rao G
> Assignee: István Fajth
> Priority: Major
>
> Currently there are 4 TODO-s in the GRPC client.
> 1. L331 adds a note that we should cache the current leader, so that we can
> go to the leader next time.
> 2. L422 adds a note about sendCommandAsync, which states that it is not
> async. The code on the other hand seems to be returning a CompletableFuture
> instance wrapped inside an XceiverClientReply, though sometimes we wait on
> the future before really returning.
> 3. L452 notes that async requests are served out of order, and this should be
> revisited if we make the API async.
> 4. L483 is connected to #2, and it notes that we should reuse stream
> observers if we are going down the async route
> The latter three requires deeper investigation and understanding, to see how
> we can approach fixing it, and to figure out whether we really need to fix it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]