Ivan Andika created HDDS-15271:
----------------------------------

             Summary: Client should prioritize replicas with BCSID covering the 
blocks.
                 Key: HDDS-15271
                 URL: https://issues.apache.org/jira/browse/HDDS-15271
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ivan Andika
            Assignee: Ivan Andika


Currently, client read prioritizes based on the locality, datanode status 
(maintenance & decommission), etc. However, the client do not check whether the 
replica BCSID covers the block the client is trying to read. This causes 
BCSID_MISMATCH which triggers failover and increases read latency.

The idea of this patch is to also consider the BCSID as a hint (not a 
requirement) for client to pick a datanode. If a client requested a block with 
BCSID N, any datanodes that contains BCSID >= N should be prioritized over 
those that have datanodes BCSID < N. 

However, we need to note a few things
* We should not exclude the replicas with BCSID < N since the container replica 
BCSID might be stale (either container location cache is stale or container 
replica heartbeat has not been recorded by SCM). This means that although we 
will read from replicas with BCSID < N if the previous replicas with BCSID >= N 
have been replicated.
* We need to consider all BCSID >= N as being equal. So a replica 1 with BCSID 
N + 1 and replica 2 with BCSID N + 2 are the same even though replica 2 is more 
up-to-date. This should prevent hotspot.

We can include BCSID as the sorting requirements for client read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to