Re: [PR] KAFKA-20613: Add code to support copy record in DLQ state manager. [kafka]

via GitHub Mon, 08 Jun 2026 14:17:13 -0700


apoorvmittal10 commented on code in PR #22479:
URL: https://github.com/apache/kafka/pull/22479#discussion_r3376398859



##########
server/src/main/java/org/apache/kafka/server/share/dlq/ShareGroupDLQStateManager.java:
##########
@@ -121,10 +134,16 @@ public ShareGroupDLQStateManager(
             throw new IllegalArgumentException("ShareGroupMetrics must not be 
null.");
         }
 
+        if (logReader == null) {
+            throw new IllegalArgumentException("LogReader must not be null.");
+        }
+
         this.time = time;
         this.timer = timer;
         this.cacheHelper = cacheHelper;
         this.shareGroupMetrics = shareGroupMetrics;
+        this.maxFetchBytes = maxFetchBytes;
+        this.logReader = logReader;

Review Comment:
   Generally yes but are we always going to read from local log and not from 
remote storage for data? If remote fetches are required then this approach 
alone will not work, we require handling to fetch messages from remote storage.



##########
server/src/main/java/org/apache/kafka/server/share/dlq/ShareGroupDLQStateManager.java:
##########
@@ -653,6 +684,80 @@ private void handleProduceResponse(ClientResponse 
response) {
                     requestErrorResponse(clientResponseError.exception());
             }
         }
+
+        private void maybeFetchRecordData() {
+            if (cacheHelper.isShareGroupDlqCopyRecordEnabled(param.groupId())) 
{
+                // A non-null originalRecordData indicates that the data for 
the offsets was
+                // already fetched at a previous time. This could happen in 
case there was
+                // a retriable exception in a previous produce request, and it 
is being re-sent.
+                // This optimization will help in reducing LogReader.read 
calls. Note that an
+                // empty (but non-null) map means a previous fetch found no 
records in range
+                // (e.g. all offsets compacted away), so we still skip 
re-fetching in that case.
+                if (originalRecordData != null) {
+                    return;
+                }
+
+                long startTime = time.hiResClockMs();
+                TopicIdPartition tp = param.topicIdPartition();
+
+                FetchParams fetchParams = new FetchParams(
+                    FetchRequest.CONSUMER_REPLICA_ID,           // -1, reading 
as a consumer
+                    -1,                                         // replicaEpoch
+                    0L,                                         // maxWaitMs - 
don't block
+                    1,                                          // minBytes
+                    maxFetchBytes,                              // maxBytes
+                    FetchIsolation.HIGH_WATERMARK,              // committed 
only
+                    Optional.empty()                            // 
clientMetadata
+                );
+
+                long nextOffset = param.firstOffset();
+                long endOffset = param.lastOffset();
+                int recordCount = (int) (param.lastOffset() - 
param.firstOffset() + 1);
+
+                Map<Long, Record> recordMap = new HashMap<>(recordCount);
+                LinkedHashMap<TopicIdPartition, Long> offsets = new 
LinkedHashMap<>();
+                LinkedHashMap<TopicIdPartition, Integer> maxBytesMap = new 
LinkedHashMap<>();
+                maxBytesMap.put(tp, maxFetchBytes);
+
+                // We are fetching data for one TopicIdPartition only. Hence, 
there
+                // is no need to keep recreating the maxBytes map, and we can 
re-use a
+                // single copy. In similar vein, we needn't clear the offsets 
map
+                // either and just update the value corresponding to the 
TopicIdPartition
+                // key in offsets map within the while loop.
+                while (nextOffset <= endOffset) {
+                    offsets.put(tp, nextOffset);
+
+                    LinkedHashMap<TopicIdPartition, LogReadResult> result =
+                        logReader.read(fetchParams, Set.of(tp), offsets, 
maxBytesMap);
+
+                    LogReadResult res = result.get(param.topicIdPartition());
+                    if (res == null || res.error().code() != 
Errors.NONE.code()) {
+                        log.warn("Unable to fetch actual record at offset {} 
for handler {}.", nextOffset, this);
+                        return;
+                    }
+
+                    boolean done = false;
+                    for (RecordBatch batch : res.info().records.batches()) {
+                        for (Record record : batch) {

Review Comment:
   Can it happen that while fetching the offsets were not compacted but when 
DLQ is triggered then offsets were compacted and not found? Can we have a test 
for same?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-20613: Add code to support copy record in DLQ state manager. [kafka]

Reply via email to