Thanks for the feedback Jun, Jay and Ismael. To answer Jun's question > 1. In ListOffsetResponse, I am not sure if it's useful to return the > timestamp since the user can always find the timestamp by fetching the > message at the returned offset.
Returning timestamp with offset may be useful if users do not always want to read the message afterwards. For example, user may not want to consume a message if its timestamp is too far away from the target timestamp. Also users may only want to perform some query based timestamp without consuming all the messages. e.g. total number of messages in a time range. Regarding Jay's question: > One minor thing. I think the old v0 list offsets request also gave you the > highwater mark, it kind of shoves it in as the last thing in the array of > offsets. This is used internally to implement seekToEnd() iirc. How would > that work once v0 is removed? Because we will preserve the behavior of target time -1(latest) and -2(earliest). The high watermark will be returned if the target time is -1 (latest) in ListOffsetRequest v1. From consumer's point of view, it will not see the messages beyond the high watermark, so "latest" is the high watermark. It seems that KafkaConsumer.seekToEnd() is already doing that, so the behavior won't change. For the batch processing use case, it seems also supported by the ListOffsetRequest with -1 target time. Admittedly, if a batch processing always consumes from a begin timestamp until the current high watermark, piggy backing the high watermark in the ListOffsetResponse v1 may save one ListOffsetRequest. ( If we don't piggy back the high watermark, user needs to issue two ListOffsetRequests, one for begin time, the other for the high watermark). However, it seems make the protocol less clean. @Ismael, I fixed the wiki you pointed out. offsetsForTimes() sounds good to me. The currently offsetsForTimes() API obviously does not support querying multiple timestamps for the same partition. It doesn't seems a feature for ListOffsetRequest v0 either (sounds more like a bug). My intuition is that it's a rare use case. Given it does not exist before and we don't see a strong need from the community either, maybe it is better to keep it simple for ListOffsetRequest v1. We can add it later if it turns out to be a useful feature (that may need a interface change, but I honestly do not think people would frequently query many different timestamps for the same partition) Have a good long weekend! Thanks, Jiangjie (Becket) Qin On Fri, Sep 2, 2016 at 6:10 PM, Ismael Juma <ism...@juma.me.uk> wrote: > Thanks for the proposal Becket. Looks good overall, a few comments: > > ListOffsetResponse => [TopicName [PartitionOffsets]] > > PartitionOffsets => Partition ErrorCode Timestamp [Offset] > > Partition => int32 > > ErrorCode => int16 > > Timestamp => int64 > > Offset => int > > > It should be int64 for `Offset` right? > > Implementation wise, we will migrate to o.a.k.common.requests. > ListOffsetRequest > > class on the broker side. > > > Could you clarify what you mean here? We already > use o.a.k.common.requests.ListOffsetRequest in KafkaApis. > > long offset = consumer.offsetForTime(Collections.singletonMap( > topicPartition, > > targetTime)).offset; > > > The result of `offsetForTime` is a Map, so we can't just call `offset` on > it. You probably meant something like: > > long offset = consumer.offsetForTime(Collections.singletonMap( > topicPartition, > targetTime)).get(topicPartition).offset; > > Test searchByTimestamp with CreateTime and LogAppendTime > > > > Do you mean `Test offsetForTime`? > > And: > > 1. In KAFKA-1588, the following issue was described "When performing an > OffsetRequest, if you request the same topic and partition combination in a > single request more than once (for example, if you want to get both the > head and tail offsets for a partition in the same request), you will get a > response for both, but they will be the same offset". Will the new request > version support the use case where multiple timestamps are passed for the > same topic partition? And if we do support it at the protocol level, do we > also want to support it at the API level or do we think the additional > complexity is not worth it? > > 2. Is `offsetForTime` the right method name given that we are getting > multiple offsets? Maybe it should be `offsetsForTimes` or something like > that. > > Ismael > > On Wed, Aug 31, 2016 at 4:38 AM, Becket Qin <becket....@gmail.com> wrote: > > > Hi Kafka devs, > > > > I created KIP-79 to allow consumer to precisely query the offsets based > on > > timestamp. > > > > In short we propose to : > > 1. add a ListOffsetRequest/ListOffsetResponse v1, and > > 2. add an offsetForTime() method in new consumer. > > > > The KIP wiki is the following: > > https://cwiki.apache.org/confluence/pages/viewpage. > action?pageId=65868090 > > > > Comments are welcome. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > >