[ 
https://issues.apache.org/jira/browse/KAFKA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389466#comment-14389466
 ] 

Jiangjie Qin commented on KAFKA-2076:
-------------------------------------

[~jkreps] Totally agree that we need to think about improvement on new consumer 
API to fix the gap between new consumer and old consumer. I think the new 
consumer API should provide all reasonable functions simple consumer supports. 
Obviously it won't be as strong because currently simple consumer supports 
sending every type of request and responses.
I think we can create the following new APIs:
long earliestOffsetsFor(List<TopicPartition> partitions)
long latestOffsetsFor(List<TopicPartition> partitions)
long offsetsBefore(Map<TopicPartition, Long> partitions)

Not sure if we need a KIP on this, if you think so please let me know.

I went through all the possible requests simple consumer can send. 
1. ConsumerMetadataRequest - Should be supported after coordinator 
implementation
2. FetchRequest - supported with seek() and poll()
3. OffsetCommitRequest - supported with commit() and commit(offsetMap)
4. OffsetRequest - not supported
5. OffsetFetchRequest - supported by committed()
6. TopicMetadataRequest - supported with partitionsFor()
>From what I can see and also as you mentioned, there are only following 
>functions that are useful but not yet supported by new consumer yet:
1. getOffsetBefore
2. earliestOrLatest
3. high watermark

Implementation wise, I kind of think high watermark is a little bit different 
from other offsets in the following ways:
1. Consumer offset is determined by consumer, while HW is determined by 
producer. This means consumer offsets needs only minimum communication with 
broker, but HW needs frequent communication.
2. Typically user will only fetch offsets when starting consumption but user 
may care about HW both before starting consumption and during the consuming as 
it reflects lags. This means the HW updates should be cheap otherwise the 
overhead would be big.
Based on the above two points. I think 
The implementation I have in mind now is to store HW in a separate map. The 
value will be updated when:
1. latestOffsetFor(partition) is called, which always sends a OffsetRequest.
2. a piggybacked HW from fetch request is received.

I don't quite understand why the server side offset time stamp would be a 
blocker for adding this API. It looks transparent to the user, right?

> Add an API to new consumer to allow user get high watermark of partitions.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-2076
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2076
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jiangjie Qin
>
> We have a use case that user wants to know how far it is behind a particular 
> partition on startup. Currently in each fetch response, we have high 
> watermark for each partition, we only keep a global max-lag metric. It would 
> be better that we keep a record of high watermark per partition and update it 
> on each fetch response. We can add a new API to let user query the high 
> watermark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to