Hi Guozhang,

I thought about this again and it seems we stilll need the
time.index.interval.ms configuration to avoid unnecessary frequent time
index insertion.

I just updated the wiki to add index.interval.bytes as an additional
constraints for time index entry insertion. Another slight change made was
that as long as a message timestamp shows time.index.interval.ms has passed
since the timestamp of last time index entry, we will insert another
timestmap index entry. Previously we always insert time index at
time.index.interval.ms bucket boundaries.

Thanks,

Jiangjie (Becket) Qin

On Wed, Feb 24, 2016 at 2:40 PM, Becket Qin <becket....@gmail.com> wrote:

> Thanks for the comment Guozhang,
>
> I just changed the configuration name to "time.index.interval.ms".
>
> It seems the real question here is how big the offset indices will be.
> Theoretically we can have one time index entry for each message in a log
> segment. For example, if there is one event per minute appended, we might
> have to have a time index entry for each message until the segment size is
> reached. In that case, the number of index entries in the time index would
> be (segment size / avg message size). So the time index file size can
> potentially be big.
>
> I am wondering if we can simply reuse the "index.interval.bytes"
> configuration instead of having a separate time index interval ms. i.e.
> instead of inserting a new entry based on time interval, we still insert it
> based on bytes interval. This does not affect the granularity because we
> can search from the nearest index entry to find the message with correct
> timestamp. The good thing is that this guarantees there will not be huge
> time indices. We also save the new configuration.
>
> What do you think?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Feb 24, 2016 at 1:00 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
>> Thanks Jiangjie, a few comments on the wiki:
>>
>> 1. Config name "time.index.interval" to "time.index.interval.ms" to be
>> consistent. Also do we need a "time.index.size.max.bytes" as well?
>>
>> 2. Will the memory mapped index file for timestamp have the same default
>> initial / max size (10485760) as the offset index?
>>
>> Otherwise LGTM.
>>
>> Guozhang
>>
>> On Tue, Feb 23, 2016 at 5:05 PM, Becket Qin <becket....@gmail.com> wrote:
>>
>> > Bump.
>> >
>> > Per Jun's comments during KIP hangout, I have updated wiki with the
>> upgrade
>> > plan or KIP-33.
>> >
>> > Let's vote!
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Wed, Feb 3, 2016 at 10:32 AM, Becket Qin <becket....@gmail.com>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I would like to initiate the vote for KIP-33.
>> > >
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33
>> > > +-+Add+a+time+based+log+index
>> > >
>> > > A good amount of the KIP has been touched during the discussion on
>> > KIP-32.
>> > > So I also put the link to KIP-32 here for reference.
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP
>> > > -32+-+Add+timestamps+to+Kafka+message
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>

Reply via email to