[ https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edoardo Comar reopened KAFKA-15905: ----------------------------------- reopening for backporting to 3.7.1 to be confermed > Restarts of MirrorCheckpointTask should not permanently interrupt offset > translation > ------------------------------------------------------------------------------------ > > Key: KAFKA-15905 > URL: https://issues.apache.org/jira/browse/KAFKA-15905 > Project: Kafka > Issue Type: Improvement > Components: mirrormaker > Affects Versions: 3.6.0 > Reporter: Greg Harris > Assignee: Edoardo Comar > Priority: Major > Fix For: 3.7.1, 3.8 > > > Executive summary: When the MirrorCheckpointTask restarts, it loses the state > of checkpointsPerConsumerGroup, which limits offset translation to records > mirrored after the latest restart. > For example, if 1000 records are mirrored and the OffsetSyncs are read by > MirrorCheckpointTask, the emitted checkpoints are cached, and translation can > happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more > records are mirrored, translation can happen at the ~1500th record, but no > longer at the ~500th record. > Context: > Before KAFKA-13659, MM2 made translation decisions based on the > incompletely-initialized OffsetSyncStore, and the checkpoint could appear to > go backwards temporarily during restarts. To fix this, we forced the > OffsetSyncStore to initialize completely before translation could take place, > ensuring that the latest OffsetSync had been read, and thus providing the > most accurate translation. > Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. > Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to > allow for translation of earlier offsets. This came with the caveat that the > cache's sparseness allowed translations to go backwards permanently. To > prevent this behavior, a cache of the latest Checkpoints was kept in the > MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset > translation remained restricted to the fully-initialized OffsetSyncStore. > Effectively, the MirrorCheckpointTask ensures that it translates based on an > OffsetSync emitted during it's lifetime, to ensure that no previous > MirrorCheckpointTask emitted a later sync. If we can read the checkpoints > emitted by previous generations of MirrorCheckpointTask, we can still ensure > that checkpoints are monotonic, while allowing translation further back in > history. -- This message was sent by Atlassian Jira (v8.20.10#820010)