[
https://issues.apache.org/jira/browse/HDDS-14853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen updated HDDS-14853:
------------------------------
Summary: Reduce duplicate logs created by
ECReconstructionCoordinator#reconstructECBlockGroup (was: Reduce duplicate log
message created by ECReconstructionCoordinator#reconstructECBlockGroup)
> Reduce duplicate logs created by
> ECReconstructionCoordinator#reconstructECBlockGroup
> ------------------------------------------------------------------------------------
>
> Key: HDDS-14853
> URL: https://issues.apache.org/jira/browse/HDDS-14853
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> In a Ozone cluster, it's found that datanode log file is full of following
> message,
> {code:java}
> 2026-02-28 13:45:12,139 INFO
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
> Block group details for conID: 79124 locID: 115816896931420725 bcsId: 0
> replicaIndex: null. Replication Config EC{rs-3-2-1024k}. Calculated safe
> length: 805306368.
> 2026-02-28 13:45:12,140 INFO
> [ContainerReplicationThread-0]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator:
> Block Data for: conID: 79124 locID: 115816896931420725 bcsId: 0
> replicaIndex: 2 replica Index: 2 block length: 268435456 block group length:
> 805306368 chunk list:
> {code}
> In a 77 MB datanode log file with 1588052 lines, which includes the logs
> between 2026-02-28 13:44:48 - 2026-02-28 13:45:12, less than 1 minute, there
> is no ERROR, WARN or Exception, only above type messages.
> Here is the code which produces this log messages. Assume there is one failed
> datanode cannot be connected, in the while loop, every time sis.recoverChunks
> returns a full strip data(not sure if exactly one strip, but likely far less
> than block size), failedIndexes is not null, this logBlockGroupDetails will
> be called once, so hundreds of duplicated message are created during one
> block reconstruction.
> {code:java}
> if (!toReconstructIndexes.isEmpty()) {
> sis.setRecoveryIndexes(toReconstructIndexes.stream().map(i -> (i -
> 1))
> .collect(Collectors.toSet()));
> long length = safeBlockGroupLength;
> while (length > 0) {
> int readLen;
> try {
> readLen = sis.recoverChunks(bufs);
> Set<Integer> failedIndexes = sis.getFailedIndexes();
> if (!failedIndexes.isEmpty()) {
> // There was a problem reading some of the block indexes, but
> we
> // did not get an exception as there must have been spare
> indexes
> // to try and recover from. Therefore we should log out the
> block
> // group details in the same way as for the exception case
> below.
> logBlockGroupDetails(blockLocationInfo, repConfig,
> blockDataGroup);
> }
> } catch (IOException e) {
> {code}
> Since the input of logBlockGroupDetails(), blockLocationInfo, repConfig,
> blockDataGroup will not change during the whole while loop, we can just log
> once.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]