Sammi Chen created HDDS-14853:
---------------------------------
Summary: Reduce duplicate log message created by
ECReconstructionCoordinator#reconstructECBlockGroup
Key: HDDS-14853
URL: https://issues.apache.org/jira/browse/HDDS-14853
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Sammi Chen
Assignee: Sammi Chen
In a Ozone cluster, it's found that datanode log file is full of following
message,
{code:java}
2026-02-28 13:45:12,139 INFO
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block group details for conID: 79124 locID: 115816896931420725 bcsId: 0
replicaIndex: null. Replication Config EC{rs-3-2-1024k}. Calculated safe
length: 805306368.
2026-02-28 13:45:12,140 INFO
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
Block Data for: conID: 79124 locID: 115816896931420725 bcsId: 0 replicaIndex:
2 replica Index: 2 block length: 268435456 block group length: 805306368 chunk
list:
{code}
In a 77 MB datanode log file with 1588052 lines, which includes the logs
between 2026-02-28 13:44:48 - 2026-02-28 13:45:12, less than 1 minute, there
is no ERROR, WARN or Exception, only above type messages.
Here is the code which produces this log messages. Assume there is one failed
datanode cannot be connected, in the while loop, every time sis.recoverChunks
returns a full strip data(not sure if exactly one strip, but likely far less
than block size), failedIndexes is not null, this logBlockGroupDetails will be
called once, so hundreds of duplicated message are created during one block
reconstruction.
{code:java}
if (!toReconstructIndexes.isEmpty()) {
sis.setRecoveryIndexes(toReconstructIndexes.stream().map(i -> (i - 1))
.collect(Collectors.toSet()));
long length = safeBlockGroupLength;
while (length > 0) {
int readLen;
try {
readLen = sis.recoverChunks(bufs);
Set<Integer> failedIndexes = sis.getFailedIndexes();
if (!failedIndexes.isEmpty()) {
// There was a problem reading some of the block indexes, but we
// did not get an exception as there must have been spare
indexes
// to try and recover from. Therefore we should log out the
block
// group details in the same way as for the exception case
below.
logBlockGroupDetails(blockLocationInfo, repConfig,
blockDataGroup);
}
} catch (IOException e) {
{code}
Since the input of logBlockGroupDetails(), blockLocationInfo, repConfig,
blockDataGroup will not change during the whole while loop, we can just log
once.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]