Werner Daehn created KAFKA-20276:
------------------------------------
Summary: API to read data between offsets
Key: KAFKA-20276
URL: https://issues.apache.org/jira/browse/KAFKA-20276
Project: Kafka
Issue Type: Improvement
Components: consumer
Affects Versions: 4.1.1
Reporter: Werner Daehn
For introspecting a topic/partition it is quite common to
* Read the last 100 messages
* Read the first 100 messages
* Read all messages between offset 100 and 200
This can be done by reading the watermark for the first two use cases or
position the consumer to the start offset, keep polling the data and stop the
poll once the end offset has been overshot.
But beside of it being complicated, it also comes with downsides.
It is complicated, because an asynchronous polling is needed whereas all the
user wants to do is reading a "file" from line 100 to 200. An API like `msg[]
consumer.read(topicpartition, startoffset, endoffset)`.
And it is full of side effects.
What are the last 100 messages? A compaction might have happened and hence the
offset list has holes in it, you would need to read offsets 50-200 to get the
last 100 messages.
Do you want skip or include tumbstone messages?
Last poll returned offset 199, the next poll(10) waits for the full 10 seconds
for more data and then will return just the offset 200 message. Setting the
poll size to 1 record comes with downsides also.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)