bhavna2004 opened a new pull request, #22516:
URL: https://github.com/apache/kafka/pull/22516

   This PR enhances MirrorMaker 2's `MirrorSourceTask` with intelligent fault 
detection 
   and automatic recovery capabilities for two critical failure scenarios in 
data replication 
   pipelines.
   
   ### Modified Files
   - 
`connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceTask.java`
   - 
`connect/mirror/src/test/java/org/apache/kafka/connect/mirror/MirrorSourceTaskTest.java`
   
   ### Task 1 — Log Truncation Detection (Fail-Fast)
   
   Adds detection for silent data loss caused by Kafka retention policies 
purging messages 
   before replication completes.
   
   - Added `detectLogTruncation()` which compares MM2's last committed offset 
against the 
     broker's `logStartOffset` via `consumer.beginningOffsets()` at every 
initialization. 
     This covers both the restart scenario and the continuous-running scenario.
   - Added runtime gap detection in `poll()` that throws `DataLossException` 
when consecutive 
     record offsets reveal a forward gap indicating data was purged during 
active replication.
   - Added `DataLossException` as a static inner class to signal irrecoverable 
data loss.
   
   ### Task 2 — Topic Reset Detection and Recovery
   
   Adds automatic recovery from topic deletion and recreation (planned 
maintenance resets).
   
   - Added `recordTopicIds()` which seeds topic UUIDs at initialization using 
the Admin Client.
   - Added `detectAndHandleTopicReset()` which compares current topic IDs 
against stored IDs 
     every 5 seconds in the poll loop. On UUID change, seeks all affected 
partitions to 
     offset 0 and resumes replication automatically.
   - Added backward offset jump detection in `poll()` as a complementary 
strategy. When a 
     record arrives with an offset lower than the last seen offset, MM2 seeks 
to the beginning 
     and resumes replication without operator intervention.
   
   ## Testing
   
   - Added unit tests covering all fault-detection scenarios including edge 
cases:
     - Log truncation at startup and during runtime
     - Boundary conditions (logStartOffset == lastCommittedOffset + 1)
     - Uncommitted and negative offset handling
     - Multi-partition truncation scenarios
     - Topic ID change detection and idempotency
     - Backward offset jump detection
     - No false positives on first record or unchanged topic IDs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to