Sammi Chen created HDDS-14853:
---------------------------------

             Summary: Reduce duplicate log message created by 
ECReconstructionCoordinator#reconstructECBlockGroup
                 Key: HDDS-14853
                 URL: https://issues.apache.org/jira/browse/HDDS-14853
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Sammi Chen
            Assignee: Sammi Chen


In a Ozone cluster, it's found that datanode log file is full of following 
message, 


{code:java}
2026-02-28 13:45:12,139 INFO 
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 Block group details for conID: 79124 locID: 115816896931420725 bcsId: 0 
replicaIndex: null. Replication Config EC{rs-3-2-1024k}. Calculated safe 
length: 805306368. 
2026-02-28 13:45:12,140 INFO 
[ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
 Block Data for: conID: 79124 locID: 115816896931420725 bcsId: 0 replicaIndex: 
2 replica Index: 2 block length: 268435456 block group length: 805306368 chunk 
list:
{code}

In a 77 MB datanode log file with 1588052 lines, which includes the logs 
between 2026-02-28 13:44:48 - 2026-02-28 13:45:12, less than 1 minute,  there 
is no ERROR, WARN or Exception, only above type messages.  

Here is the code which produces this log messages. Assume there is one failed 
datanode cannot be connected, in the while loop, every time sis.recoverChunks 
returns a full strip data(not sure if exactly one strip, but likely far less 
than block size), failedIndexes is not null, this logBlockGroupDetails will be 
called once, so hundreds of duplicated message are created during one block 
reconstruction. 


{code:java}

 if (!toReconstructIndexes.isEmpty()) {
          sis.setRecoveryIndexes(toReconstructIndexes.stream().map(i -> (i - 1))
              .collect(Collectors.toSet()));
          long length = safeBlockGroupLength;
          while (length > 0) {
            int readLen;
            try {
              readLen = sis.recoverChunks(bufs);
              Set<Integer> failedIndexes = sis.getFailedIndexes();
              if (!failedIndexes.isEmpty()) {
                // There was a problem reading some of the block indexes, but we
                // did not get an exception as there must have been spare 
indexes
                // to try and recover from. Therefore we should log out the 
block
                // group details in the same way as for the exception case 
below.
                logBlockGroupDetails(blockLocationInfo, repConfig,
                    blockDataGroup);
              }
            } catch (IOException e) {
{code}

Since the input of logBlockGroupDetails(), blockLocationInfo, repConfig, 
blockDataGroup will not change during the whole while loop, we can just log 
once. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to