Andrey Yarovoy created HDDS-15680:
-------------------------------------

             Summary: ECFileChecksumHelper rebuilds the checksum pipeline on 
every block instead of caching per placement group
                 Key: HDDS-15680
                 URL: https://issues.apache.org/jira/browse/HDDS-15680
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Andrey Yarovoy


{{ECFileChecksumHelper.getChunkInfos}} is called once per block during EC file 
checksum computation. Each call unconditionally rebuilds the STANDALONE 
pipeline used to contact datanodes from scratch:
 # Iterates all N EC nodes (9 for EC 6+3), calling 
{{pipeline.getReplicaIndex(dn)}} for each to filter to replica index 1 and 
parity nodes
 # Sorts the selected node UUIDs into a string key and calls 
{{UUID.nameUUIDFromBytes}} (a hash computation) to derive the deterministic 
pipeline ID
 # Allocates a new {{Pipeline}} object via {{Pipeline.newBuilder()}} with 5 
field assignments
 # Calls {{pipeline.getReplicaIndexes()}} as an argument to 
{{ContainerProtocolCalls.getBlock}} — this streams over the already-filtered 
nodes, calls {{getReplicaIndex}} on each again, and allocates a new 
{{{}Map{}}}, even though the identical map ({{{}selectedReplicaIndexes{}}}) was 
just built in the same method

For a file where all blocks reside in one EC placement group (the common case), 
steps 1–4 produce identical output on every block. A file with N blocks 
performs N full reconstructions instead of 1.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to