FrankYang0529 commented on code in PR #18012:
URL: https://github.com/apache/kafka/pull/18012#discussion_r1907410734
##########
storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java:
##########
@@ -257,13 +257,21 @@ public void append(long largestOffset,
if (largestTimestampMs > maxTimestampSoFar()) {
maxTimestampAndOffsetSoFar = new
TimestampOffset(largestTimestampMs, shallowOffsetOfMaxTimestamp);
}
- // append an entry to the index (if needed)
+ // append an entry to the timestamp index at MemoryRecords level
(if needed)
if (bytesSinceLastIndexEntry > indexIntervalBytes) {
- offsetIndex().append(largestOffset, physicalPosition);
timeIndex().maybeAppend(maxTimestampSoFar(),
shallowOffsetOfMaxTimestampSoFar());
- bytesSinceLastIndexEntry = 0;
}
- bytesSinceLastIndexEntry += records.sizeInBytes();
+
+ // append an entry to the offset index at batches level (if needed)
+ for (RecordBatch batch : records.batches()) {
+ if (bytesSinceLastIndexEntry > indexIntervalBytes &&
+ batch.lastOffset() >= offsetIndex().lastOffset()) {
+ offsetIndex().append(batch.lastOffset(), physicalPosition);
Review Comment:
Hi @junrao, thanks for review. I addressed both comments.
For timestamp, it's not always monotonic in records, so checking offset by
timestamp index is not as much as offset index. Probably, we can consider
whether it's worth to add timestamp for each batch, because this operation
introduces more cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]