Andrey Yarovoy created HDDS-15643:
-------------------------------------
Summary: ECFileChecksumHelper: redundant OM lookupKey RPC and
per-file gRPC connection creation for EC checksum collection
Key: HDDS-15643
URL: https://issues.apache.org/jira/browse/HDDS-15643
Project: Apache Ozone
Issue Type: Bug
Reporter: Andrey Yarovoy
*Description:*
Checksum collection for EC files has three structural inefficiencies that make
each file's cost far higher than necessary. All three are present in the
current code and compound under any non-trivial OM latency.
*Bug 1 — Double {{lookupKey}} RPC per file ({{{}BaseFileChecksumHelper{}}})*
The 7-arg constructor (which accepts a pre-fetched {{{}OmKeyInfo{}}}) delegates
to {{{}this(6-arg){}}}. The 6-arg constructor calls {{fetchBlocks()}} before
returning, and {{fetchBlocks()}} checks {{if (keyInfo == null)}} to decide
whether to issue a {{lookupKey}} RPC. Because {{this.keyInfo = keyInfo}}
executes only after the delegation returns, {{keyInfo}} is always null at the
time of that check — so a redundant {{lookupKey}} is fired for every file
regardless of whether the caller already supplied one.
*Bug 2 — New gRPC connection opened for every file
({{{}ECFileChecksumHelper{}}})*
{{getChunkInfos()}} builds a 3-node STANDALONE pipeline to read the stripe
checksum (replica index 1 plus the two parity nodes). It calls
{{{}pipeline.toBuilder().setNodes(nodes).build(){}}}.
{{Pipeline.Builder.setNodes()}} detects that the 3-node set differs from the
5-node EC {{nodeStatus}} and unconditionally calls
{{{}PipelineID.randomId(){}}}, generating a fresh random UUID per file. Since
{{XceiverClientManager}} keys its gRPC connection cache on pipeline ID, the
cache never hits and a new connection is opened for every file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]