[
https://issues.apache.org/jira/browse/HDDS-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Yarovoy reassigned HDDS-15680:
-------------------------------------
Assignee: Andrey Yarovoy
> ECFileChecksumHelper rebuilds the checksum pipeline on every block instead of
> caching per placement group
> ---------------------------------------------------------------------------------------------------------
>
> Key: HDDS-15680
> URL: https://issues.apache.org/jira/browse/HDDS-15680
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Andrey Yarovoy
> Assignee: Andrey Yarovoy
> Priority: Major
>
> {{ECFileChecksumHelper.getChunkInfos}} is called once per block during EC
> file checksum computation. Each call unconditionally rebuilds the STANDALONE
> pipeline used to contact datanodes from scratch:
> # Iterates all N EC nodes (9 for EC 6+3), calling
> {{pipeline.getReplicaIndex(dn)}} for each to filter to replica index 1 and
> parity nodes
> # Sorts the selected node UUIDs into a string key and calls
> {{UUID.nameUUIDFromBytes}} (a hash computation) to derive the deterministic
> pipeline ID
> # Allocates a new {{Pipeline}} object via {{Pipeline.newBuilder()}} with 5
> field assignments
> # Calls {{pipeline.getReplicaIndexes()}} as an argument to
> {{ContainerProtocolCalls.getBlock}} — this streams over the already-filtered
> nodes, calls {{getReplicaIndex}} on each again, and allocates a new
> {{{}Map{}}}, even though the identical map ({{{}selectedReplicaIndexes{}}})
> was just built in the same method
> For a file where all blocks reside in one EC placement group (the common
> case), steps 1–4 produce identical output on every block. A file with N
> blocks performs N full reconstructions instead of 1.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]