errose28 commented on PR #7111:
URL: https://github.com/apache/ozone/pull/7111#issuecomment-2310440847

   Hi @slfan1989 this is being developed as part of the container 
reconciliation feature in HDDS-10239. This feature provides two high level 
functionalities for containers:
   1. The ability to report their contents to SCM via a container level hash 
which can be compared to other replicas.
   2. The ability to "reconcile" a container replica with its peers when that 
hash differs. This means making incremental updates to a container based on 
data a peer node has that the current node may be missing or have lost.
   
   The current design document can be found 
[here](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md).
 In particular you can refer to the section on [phases of 
implementation](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md#phase-i-outlined-in-this-document).
 We are currently implementing phase 1, which only applies to Ratis containers. 
Support for EC containers are in phase 3, which we have not planned for yet. 
This is because EC already has a reconciliation algorithm as described in (2) 
above, which is reconstruction.
   > For 3-replica blocks, if we find that a block write operation has an 
issue, we can repair it using the other replicas. 
   
   So in this case, the fix should be made in the reconstruction code path, 
since that is an existing way to repair EC containers after they have been 
closed.
   
   > However, for EC blocks, it becomes more challenging to determine the true 
length of the block.
   
   EC and Ratis differ here. In Ratis the longest block length wins, because we 
have a quorum on the server side to commit the last write. In EC, the shortest 
block wins because it is up to the client to make sure all datanode replicas 
have committed the last issued write before the client commits that length back 
to the OM. If only a few datanodes commit, that stripe is invalid and not 
committed back to OM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to