Werner Daehn created KAFKA-20276:
------------------------------------

             Summary: API to read data between offsets
                 Key: KAFKA-20276
                 URL: https://issues.apache.org/jira/browse/KAFKA-20276
             Project: Kafka
          Issue Type: Improvement
          Components: consumer
    Affects Versions: 4.1.1
            Reporter: Werner Daehn


For introspecting a topic/partition it is quite common to
 * Read the last 100 messages
 * Read the first 100 messages
 * Read all messages between offset 100 and 200

This can be done by reading the watermark for the first two use cases or 
position the consumer to the start offset, keep polling the data and stop the 
poll once the end offset has been overshot.

But beside of it being complicated, it also comes with downsides.

It is complicated, because an asynchronous polling is needed whereas all the 
user wants to do is reading a "file" from line 100 to 200. An API like `msg[] 
consumer.read(topicpartition, startoffset, endoffset)`.

 

And it is full of side effects.

What are the last 100 messages? A compaction might have happened and hence the 
offset list has holes in it, you would need to read offsets 50-200 to get the 
last 100 messages.

Do you want skip or include tumbstone messages?

Last poll returned offset 199, the next poll(10) waits for the full 10 seconds 
for more data and then will return just the offset 200 message. Setting the 
poll size to 1 record comes with downsides also.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to