kamalcph commented on code in PR #16602:
URL: https://github.com/apache/kafka/pull/16602#discussion_r1759717681
##########
core/src/main/scala/kafka/log/UnifiedLog.scala:
##########
@@ -1336,34 +1336,39 @@ class UnifiedLog(@volatile var logStartOffset: Long,
val maxTimestampSoFar =
latestTimestampSegment.readMaxTimestampAndOffsetSoFar
// lookup the position of batch to avoid extra I/O
val position =
latestTimestampSegment.offsetIndex.lookup(maxTimestampSoFar.offset)
- latestTimestampSegment.log.batchesFrom(position.position).asScala
+ val timestampAndOffsetOpt =
latestTimestampSegment.log.batchesFrom(position.position).asScala
.find(_.maxTimestamp() == maxTimestampSoFar.timestamp)
.flatMap(batch => batch.offsetOfMaxTimestamp().asScala.map(new
TimestampAndOffset(batch.maxTimestamp(), _,
Optional.of[Integer](batch.partitionLeaderEpoch()).filter(_ >=
0))))
+ OffsetResultHolder(timestampAndOffsetOpt)
Review Comment:
I went over #15621 to see how do we handle the MAX_TIMESTAMP case for normal
topics.
Now, we maintain the
[shallowOffsetOfMaxTimestampSoFar](https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java?L212)
in LogSegment instead of the real-max-timestamp-offset. While uploading the
LogSegment to remote, we create the
[RemoteLogSegmentMetadata](https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogSegmentMetadata.java)
event which holds the metadata information about the segment.
Even, if we pass the `shallowOffsetOfMaxTimestampSoFar` in the
RemoteLogSegmentMetadata event, we have to download the remote-log-segment to
find the real `offsetOfMaxTimestampSoFar`. This will increase the load on
remote storage assuming that the original intention of the KIP-734 is to
`Confirming topic/partition "liveness"` which means the Admin client will
repeatedly invoke the list-offsets API for MAX_TIMESTAMP.
The predominant case to find the "Confirming topic/partition livness" is to
query the local-log which will work as expected. For MAX_TIMESTAMP, when
enabled with remote storage, the results can go wrong when:
1. Assuming the timestamps are monotonic and all the passive segments are
uploaded to remote and deleted from local. The only active segment in local
disk is empty (post the log roll), then the max-timestamp will returned as "-1".
2. Assuming the timestamps are non-monotonic, then the (timestamp, offset)
returned by the API may not be "TRUE" (max-timestamp, max-timestamp-offset) as
it considers the segments only in the local-log.
Should we handle/drop the MAX_TIMESTAMP case for topics enabled with remote
storage? This can cause high load:
1. On the RemoteLogMetadataManager, as we have to scan all the uploaded
segment events to find the max-timestamp. Then, compare it with the
max-timestamp computed from the local-log segments. And, the search should
always proceed from remote to local storage.
2. On the RemoteStorageManager, when there exists the MAX_TIMESTAMP record
in the remote storage, then we have to download that segment (few bytes)
repeatedly to serve the query.
In
[KIP-734](https://cwiki.apache.org/confluence/display/KAFKA/KIP-734%3A+Improve+AdminClient.listOffsets+to+return+timestamp+and+offset+for+the+record+with+the+largest+timestamp),
can we make a addendum to say that MAX_TIMESTAMP is not supported for topics
enabled with remote storage? Note that when the KIP was proposed, the intention
was not read from disk:
Snippet from
[KIP-734](https://cwiki.apache.org/confluence/display/KAFKA/KIP-734%3A+Improve+AdminClient.listOffsets+to+return+timestamp+and+offset+for+the+record+with+the+largest+timestamp):
```
LogSegments track the highest timestamp and associated offset so we don't
have to go to disk to fetch this
```
cc @satishd @showuon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]