[
https://issues.apache.org/jira/browse/HDDS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated HDDS-15643:
------------------------------------
Fix Version/s: 2.3.0
Resolution: Fixed
Status: Resolved (was: Patch Available)
> ECFileChecksumHelper: redundant OM lookupKey RPC and per-file gRPC connection
> creation for EC checksum collection
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-15643
> URL: https://issues.apache.org/jira/browse/HDDS-15643
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Andrey Yarovoy
> Assignee: Andrey Yarovoy
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.3.0
>
>
> *Description:*
> Checksum collection for EC files has three structural inefficiencies that
> make each file's cost far higher than necessary. All three are present in the
> current code and compound under any non-trivial OM latency.
> *Bug 1 — Double {{lookupKey}} RPC per file ({{{}BaseFileChecksumHelper{}}})*
> The 7-arg constructor (which accepts a pre-fetched {{{}OmKeyInfo{}}})
> delegates to {{{}this(6-arg){}}}. The 6-arg constructor calls
> {{fetchBlocks()}} before returning, and {{fetchBlocks()}} checks {{if
> (keyInfo == null)}} to decide whether to issue a {{lookupKey}} RPC. Because
> {{this.keyInfo = keyInfo}} executes only after the delegation returns,
> {{keyInfo}} is always null at the time of that check — so a redundant
> {{lookupKey}} is fired for every file regardless of whether the caller
> already supplied one.
> *Bug 2 — New gRPC connection opened for every file
> ({{{}ECFileChecksumHelper{}}})*
> {{getChunkInfos()}} builds a 3-node STANDALONE pipeline to read the stripe
> checksum (replica index 1 plus the two parity nodes). It calls
> {{{}pipeline.toBuilder().setNodes(nodes).build(){}}}.
> {{Pipeline.Builder.setNodes()}} detects that the 3-node set differs from the
> 5-node EC {{nodeStatus}} and unconditionally calls
> {{{}PipelineID.randomId(){}}}, generating a fresh random UUID per file. Since
> {{XceiverClientManager}} keys its gRPC connection cache on pipeline ID, the
> cache never hits and a new connection is opened for every file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]