Hi Jason,

Thanks for the feedback. That is a good point. For the -1 and -2 semantics,
I was just thinking we will preserve the semantics in the wire protocol.
For the user facing API, I agree that is not intuitive. We can do one of
the following:
1. Add two separate methods: earliestOffsets() and latestOffsets().
2. just have offsetsForTimes() and return the earliest if the timestamp is
negative and the latest if the timestamp is Long.MAX_VALUE.

The good thing about doing (1) is that we kind of have symmetric function
signatures like seekToBeginning() and seekToEnd(). However, even if we do
(1), we may still need to do (2) to handle the negative timestamp and the
Long.MAX_VALUE timestamp in offsetsForTimes(). Then they essentially become
redundant to earliestOffsets() and latestOffsets().

Personally I prefer option (2) because of the conciseness and it seems
intuitive enough. But I am open to option (1) as well.

Thanks,

Jiangjie (Becket) Qin

On Wed, Sep 7, 2016 at 11:06 AM, Jason Gustafson <ja...@confluent.io> wrote:

> Hey Becket,
>
> Thanks for the KIP. As I understand, the intention is to preserve the
> current behavior with a timestamp of -1 indicating latest timestamp and -2
> indicating earliest timestamp. So users can query these offsets using the
> offsetsForTimes API if they know the magic values. I'm wondering if it
> would make the usage a little nicer to have a separate API instead for
> these special cases? Sort of in the way that we expose a generic seek() and
> a seekToBeginning(), maybe we could have an earliestOffset() in addition to
> offsetsForTimes()?
>
> Thanks,
> Jason
>
> On Wed, Sep 7, 2016 at 10:04 AM, Becket Qin <becket....@gmail.com> wrote:
>
> > Thanks everyone for all the feedback.
> >
> > If there is no further concerns or comments I will start a voting thread
> on
> > this KIP tomorrow.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Sep 6, 2016 at 9:48 AM, Becket Qin <becket....@gmail.com> wrote:
> >
> > > Hi Magnus,
> > >
> > > Thanks for the comments. I agree that querying messages within a time
> > > range is a valid use case (actually this is an example use case in my
> > > previous email). The current proposal can achieve this by having two
> > > ListOffsetRequest, right? I think the current API already supports the
> > use
> > > cases that require the offsets for multiple timestamps. The question is
> > > that whether it is worth adding more complexity to the protocol to make
> > it
> > > easier for multiple timestamp query. Personally I think given that
> query
> > > multiple timestamps is likely an infrequent operation, there is no need
> > to
> > > optimize for it and complicates the protocol.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Sep 5, 2016 at 11:21 PM, Magnus Edenhill <mag...@edenhill.se>
> > > wrote:
> > >
> > >> Good write-up Qin, the API looks promising.
> > >>
> > >> I have one comment:
> > >>
> > >> 2016-09-03 5:20 GMT+02:00 Becket Qin <becket....@gmail.com>:
> > >>
> > >> > The currently offsetsForTimes() API obviously does not support
> > querying
> > >> > multiple timestamps for the same partition. It doesn't seems a
> feature
> > >> for
> > >> > ListOffsetRequest v0 either (sounds more like a bug). My intuition
> is
> > >> that
> > >> > it's a rare use case. Given it does not exist before and we don't
> see
> > a
> > >> > strong need from the community either, maybe it is better to keep it
> > >> simple
> > >> > for ListOffsetRequest v1. We can add it later if it turns out to be
> a
> > >> > useful feature (that may need a interface change, but I honestly do
> > not
> > >> > think people would frequently query many different timestamps for
> the
> > >> same
> > >> > partition)
> > >> >
> > >>
> > >> I argue that the current behaviour of OffsetRequest with regards to
> > >> duplicate partitions is a bug
> > >> and think it would be a mistake to move the same semantics over to
> thew
> > >> new
> > >> ListOffset API.
> > >> One use case is that an application may want to know the offset range
> > >> between two timestamps,
> > >> e.g., for reprocessing, batching, searching, etc.
> > >>
> > >>
> > >> Thanks,
> > >> Magnus
> > >>
> > >>
> > >>
> > >> >
> > >> > Have a good long weekend!
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Sep 2, 2016 at 6:10 PM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > >> >
> > >> > > Thanks for the proposal Becket. Looks good overall, a few
> comments:
> > >> > >
> > >> > > ListOffsetResponse => [TopicName [PartitionOffsets]]
> > >> > > >   PartitionOffsets => Partition ErrorCode Timestamp [Offset]
> > >> > > >   Partition => int32
> > >> > > >   ErrorCode => int16
> > >> > > >   Timestamp => int64
> > >> > > >   Offset => int
> > >> > >
> > >> > >
> > >> > > It should be int64 for `Offset` right?
> > >> > >
> > >> > > Implementation wise, we will migrate to o.a.k.common.requests.
> > >> > > ListOffsetRequest
> > >> > > > class on the broker side.
> > >> > >
> > >> > >
> > >> > > Could you clarify what you mean here? We already
> > >> > > use o.a.k.common.requests.ListOffsetRequest in KafkaApis.
> > >> > >
> > >> > > long offset = consumer.offsetForTime(Collections.singletonMap(
> > >> > > topicPartition,
> > >> > > > targetTime)).offset;
> > >> > >
> > >> > >
> > >> > > The result of `offsetForTime` is a Map, so we can't just call
> > >> `offset` on
> > >> > > it. You probably meant something like:
> > >> > >
> > >> > > long offset = consumer.offsetForTime(Collections.singletonMap(
> > >> > > topicPartition,
> > >> > > targetTime)).get(topicPartition).offset;
> > >> > >
> > >> > > Test searchByTimestamp with CreateTime and LogAppendTime
> > >> > > >
> > >> > >
> > >> > > Do you mean `Test offsetForTime`?
> > >> > >
> > >> > > And:
> > >> > >
> > >> > > 1. In KAFKA-1588, the following issue was described "When
> performing
> > >> an
> > >> > > OffsetRequest, if you request the same topic and partition
> > combination
> > >> > in a
> > >> > > single request more than once (for example, if you want to get
> both
> > >> the
> > >> > > head and tail offsets for a partition in the same request), you
> will
> > >> get
> > >> > a
> > >> > > response for both, but they will be the same offset". Will the new
> > >> > request
> > >> > > version support the use case where multiple timestamps are passed
> > for
> > >> the
> > >> > > same topic partition? And if we do support it at the protocol
> level,
> > >> do
> > >> > we
> > >> > > also want to support it at the API level or do we think the
> > additional
> > >> > > complexity is not worth it?
> > >> > >
> > >> > > 2. Is `offsetForTime` the right method name given that we are
> > getting
> > >> > > multiple offsets? Maybe it should be `offsetsForTimes` or
> something
> > >> like
> > >> > > that.
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Wed, Aug 31, 2016 at 4:38 AM, Becket Qin <becket....@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi Kafka devs,
> > >> > > >
> > >> > > > I created KIP-79 to allow consumer to precisely query the
> offsets
> > >> based
> > >> > > on
> > >> > > > timestamp.
> > >> > > >
> > >> > > > In short we propose to :
> > >> > > > 1. add a ListOffsetRequest/ListOffsetResponse v1, and
> > >> > > > 2. add an offsetForTime() method in new consumer.
> > >> > > >
> > >> > > > The KIP wiki is the following:
> > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > >> > > action?pageId=65868090
> > >> > > >
> > >> > > > Comments are welcome.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jiangjie (Becket) Qin
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to