Re: [PR] KAFKA-15859: Make RemoteListOffsets call an async operation [kafka]

via GitHub Sat, 14 Sep 2024 04:31:41 -0700


kamalcph commented on code in PR #16602:
URL: https://github.com/apache/kafka/pull/16602#discussion_r1759717681



##########
core/src/main/scala/kafka/log/UnifiedLog.scala:
##########
@@ -1336,34 +1336,39 @@ class UnifiedLog(@volatile var logStartOffset: Long,
         val maxTimestampSoFar = 
latestTimestampSegment.readMaxTimestampAndOffsetSoFar
         // lookup the position of batch to avoid extra I/O
         val position = 
latestTimestampSegment.offsetIndex.lookup(maxTimestampSoFar.offset)
-        latestTimestampSegment.log.batchesFrom(position.position).asScala
+        val timestampAndOffsetOpt = 
latestTimestampSegment.log.batchesFrom(position.position).asScala
           .find(_.maxTimestamp() == maxTimestampSoFar.timestamp)
           .flatMap(batch => batch.offsetOfMaxTimestamp().asScala.map(new 
TimestampAndOffset(batch.maxTimestamp(), _,
             Optional.of[Integer](batch.partitionLeaderEpoch()).filter(_ >= 
0))))
+        OffsetResultHolder(timestampAndOffsetOpt)

Review Comment:
   I went over #15621 to see how do we handle the MAX_TIMESTAMP case for normal 
topics.
   
   Now, we maintain the 
[shallowOffsetOfMaxTimestampSoFar](https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java?L212)
 in LogSegment instead of the real-max-timestamp-offset. While uploading the 
LogSegment to remote, we create the 
[RemoteLogSegmentMetadata](https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogSegmentMetadata.java)
 event which holds the metadata information about the segment. 
   
   Even, if we pass the `shallowOffsetOfMaxTimestampSoFar` in the 
RemoteLogSegmentMetadata event, we have to download the remote-log-segment to 
find the real `offsetOfMaxTimestampSoFar`. This will increase the load on 
remote storage assuming that the original intention of the KIP-734 is to 
`Confirming topic/partition "liveness"` which means the Admin client will 
repeatedly invoke the list-offsets API for MAX_TIMESTAMP. 
   
   The predominant case to find the "Confirming topic/partition livness" is to 
query the local-log which will work as expected. For MAX_TIMESTAMP, when 
enabled with remote storage, the results can go wrong when:
   
   1. Assuming the timestamps are monotonic and all the passive segments are 
uploaded to remote and deleted from local. The only active segment in local 
disk is empty (post the log roll), then the max-timestamp will returned as "-1".
   2. Assuming the timestamps are non-monotonic, then the (timestamp, offset) 
returned by the API may not be "TRUE" (max-timestamp, max-timestamp-offset) as 
it considers the segments only in the local-log.
   
   Should we handle/drop the MAX_TIMESTAMP case for topics enabled with remote 
storage? This can cause high load:
   
   1. On the RemoteLogMetadataManager, as we have to scan all the uploaded 
segment events to find the max-timestamp. Then, compare it with the 
max-timestamp computed from the local-log segments. And, the search should 
always proceed from remote to local storage.
   2. On the RemoteStorageManager, when there exists the MAX_TIMESTAMP record 
in the remote storage, then we have to download that segment (few bytes) 
repeatedly to serve the query. 
   
   
   In 
[KIP-734](https://cwiki.apache.org/confluence/display/KAFKA/KIP-734%3A+Improve+AdminClient.listOffsets+to+return+timestamp+and+offset+for+the+record+with+the+largest+timestamp),
 can we make a addendum to say that MAX_TIMESTAMP is not supported for topics 
enabled with remote storage? Note that when the KIP was proposed, the intention 
was not read from disk:
   
   Snippet from 
[KIP-734](https://cwiki.apache.org/confluence/display/KAFKA/KIP-734%3A+Improve+AdminClient.listOffsets+to+return+timestamp+and+offset+for+the+record+with+the+largest+timestamp):
   ```
   LogSegments track the highest timestamp and associated offset so we don't 
have to go to disk to fetch this
   ```
   
   cc @satishd @showuon 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-15859: Make RemoteListOffsets call an async operation [kafka]

Reply via email to