[ 
https://issues.apache.org/jira/browse/HDDS-14853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-14853:
------------------------------
    Summary: Reduce duplicate logs created by 
ECReconstructionCoordinator#reconstructECBlockGroup  (was: Reduce duplicate log 
message created by ECReconstructionCoordinator#reconstructECBlockGroup)

> Reduce duplicate logs created by 
> ECReconstructionCoordinator#reconstructECBlockGroup
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-14853
>                 URL: https://issues.apache.org/jira/browse/HDDS-14853
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>
> In a Ozone cluster, it's found that datanode log file is full of following 
> message, 
> {code:java}
> 2026-02-28 13:45:12,139 INFO 
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
>  Block group details for conID: 79124 locID: 115816896931420725 bcsId: 0 
> replicaIndex: null. Replication Config EC{rs-3-2-1024k}. Calculated safe 
> length: 805306368. 
> 2026-02-28 13:45:12,140 INFO 
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
>  Block Data for: conID: 79124 locID: 115816896931420725 bcsId: 0 
> replicaIndex: 2 replica Index: 2 block length: 268435456 block group length: 
> 805306368 chunk list:
> {code}
> In a 77 MB datanode log file with 1588052 lines, which includes the logs 
> between 2026-02-28 13:44:48 - 2026-02-28 13:45:12, less than 1 minute,  there 
> is no ERROR, WARN or Exception, only above type messages.  
> Here is the code which produces this log messages. Assume there is one failed 
> datanode cannot be connected, in the while loop, every time sis.recoverChunks 
> returns a full strip data(not sure if exactly one strip, but likely far less 
> than block size), failedIndexes is not null, this logBlockGroupDetails will 
> be called once, so hundreds of duplicated message are created during one 
> block reconstruction. 
> {code:java}
>  if (!toReconstructIndexes.isEmpty()) {
>           sis.setRecoveryIndexes(toReconstructIndexes.stream().map(i -> (i - 
> 1))
>               .collect(Collectors.toSet()));
>           long length = safeBlockGroupLength;
>           while (length > 0) {
>             int readLen;
>             try {
>               readLen = sis.recoverChunks(bufs);
>               Set<Integer> failedIndexes = sis.getFailedIndexes();
>               if (!failedIndexes.isEmpty()) {
>                 // There was a problem reading some of the block indexes, but 
> we
>                 // did not get an exception as there must have been spare 
> indexes
>                 // to try and recover from. Therefore we should log out the 
> block
>                 // group details in the same way as for the exception case 
> below.
>                 logBlockGroupDetails(blockLocationInfo, repConfig,
>                     blockDataGroup);
>               }
>             } catch (IOException e) {
> {code}
> Since the input of logBlockGroupDetails(), blockLocationInfo, repConfig, 
> blockDataGroup will not change during the whole while loop, we can just log 
> once. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to