Thanks for the feedback, Guozhang. I'll update the KIP wiki and submit a patch if no one have concerns on this.
On Wed, Aug 31, 2016 at 11:59 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Some of the streams integration tests also encounters this issue where the > timestamps we are using in the test are very small (e.g. 1,2,3...) which > cause the log to roll upon each append, and the old segment gets deleted > very soon. Arguably this can be resolved to enforce LogAppendTime > configuration on the embedded server. > > +1 on the proposed change, makes sense to me. > > Guozhang > > > On Tue, Aug 30, 2016 at 4:33 PM, Becket Qin <becket....@gmail.com> wrote: > > > Hi folks, > > > > Here is another update on the change of time based log rolling. > > > > After the latest implementation, we encountered KAFKA-4099. The issue is > > that if users move replicas, for the messages in the old segments, the > new > > replica will create one log segment for each message. The root cause of > > this is we are comparing the wall clock time with the message timestamp. > A > > solution to that is also described in KAFKA-4099, which is to change the > > log rolling purely based on the timestamp in the messages. More > > specifically, we roll out the log segment if the timestamp in the current > > message is greater than the timestamp of the first message in the segment > > by more than log.roll.ms. This approach is wall clock independent and > > should solve the problem. With message.timestamp.difference.max.ms > > configuration, we can achieve 1) the log segment will be rolled out in a > > bounded time, 2) no excessively large timestamp will be accepted and > cause > > frequent log rolling. > > > > Any concern regarding this change? > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Mon, Jun 13, 2016 at 2:30 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Thanks Jiangjie, > > > > > > I see the need for sensitive data purging, the above proposed change > > LGTM. > > > One minor concern is that a wrongly marked timestamp on the first > record > > > could cause the segment to roll much later / earlier, though it may be > > > rare. > > > > > > Guozhang > > > > > > On Fri, Jun 10, 2016 at 10:07 AM, Becket Qin <becket....@gmail.com> > > wrote: > > > > > > > Hi, > > > > > > > > During the implementation of KIP-33, we found it might be useful to > > have > > > a > > > > more deterministic time based log rolling than what proposed in the > > KIP. > > > > > > > > The current KIP proposal uses the largest timestamp in the segment > for > > > time > > > > based rolling. The active log segment only rolls when there is no > > message > > > > appended in max.roll.ms since the largest timestamp in the segment. > > i.e. > > > > the rolling time may change if user keeping appending messages into > the > > > > segment. This may not be a desirable behavior for people who have > > > sensitive > > > > data and want to make sure they are removed after some time. > > > > > > > > To solve the above issue, we want to modify the KIP proposal > regarding > > > the > > > > time based rolling to the following behavior. The time based log > > rolling > > > > will be based on the first message with a timestamp in the log > segment > > if > > > > there is such a message. If no message in the segment has a > timestamp, > > > the > > > > time based log rolling will still be based on log segment create > time, > > > > which is the same as we are doing now. The reasons we don't want to > > > always > > > > roll based on file create time are because 1) the message timestamp > may > > > be > > > > assigned by clients which can be different from the create time of > the > > > log > > > > segment file. 2) On some Linux, the file create time is not > available, > > so > > > > using segment file create time may not always work. > > > > > > > > Do people have any concern for this change? I will update the KIP if > > > people > > > > think the change is OK. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Tue, Apr 19, 2016 at 6:27 PM, Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > Thanks Joel and Ismael. I just updated the KIP based on your > > feedback. > > > > > > > > > > KIP-33 has passed with +4 (binding) and +2 (non-binding) > > > > > > > > > > Thanks everyone for the reading, feedback and voting! > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Tue, Apr 19, 2016 at 5:25 PM, Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > > > > > > > >> Thanks Becket. I think it would be nice to update the KIP with > > regards > > > > to > > > > >> point 3 and 4. > > > > >> > > > > >> In any case, +1 (non-binding) > > > > >> > > > > >> Ismael > > > > >> > > > > >> On Tue, Apr 19, 2016 at 2:03 AM, Becket Qin <becket....@gmail.com > > > > > > wrote: > > > > >> > > > > >> > Thanks for the comments Ismael. Please see the replies inline. > > > > >> > > > > > >> > On Mon, Apr 18, 2016 at 6:50 AM, Ismael Juma <ism...@juma.me.uk > > > > > > wrote: > > > > >> > > > > > >> > > Hi Jiangjie, > > > > >> > > > > > > >> > > Thanks for the KIP, it's a nice improvement. Since it seems > like > > > we > > > > >> have > > > > >> > > been using the voting thread for discussion, I'll do the same. > > > > >> > > > > > > >> > > A few minor comments/questions: > > > > >> > > > > > > >> > > 1. The proposed name for the time index file > > > > >> > `SegmentBaseOffset.timeindex`. > > > > >> > > Would `SegmentBaseOffset.time-index` be a little better? It > > would > > > > >> clearly > > > > >> > > separate the type of index in case we add additional index > types > > > in > > > > >> the > > > > >> > > future. > > > > >> > > > > > > >> > I have no strong opinion on this, I am not adding any thing > > > separator > > > > >> > because it is more regex friendly. > > > > >> > I am not sure about the other indexes, time and space seems to > be > > > two > > > > >> most > > > > >> > common dimensions. > > > > >> > > > > > >> > 2. When describing the time index entry, we say "Offset - the > next > > > > >> offset > > > > >> > > when the time index entry is inserted". I found the mention of > > > > `next` > > > > >> a > > > > >> > bit > > > > >> > > confusing as it looks to me like the time index entry has the > > > first > > > > >> > offset > > > > >> > > in the message set. > > > > >> > > > > > >> > This semantic meaning is a little different from the offset > index. > > > The > > > > >> > offset index information is self-contained by nature. i.e. all > the > > > > >> offsets > > > > >> > before is smaller than the offset of this message set. So we > only > > > need > > > > >> to > > > > >> > say "the offset of this message set is OFFSET". This does not > > quite > > > > >> apply > > > > >> > to the time index because the max timestamp may or may not be in > > the > > > > >> > message set being appended. So we have to either say, "the max > > > > timestamp > > > > >> > before I append this message set is T", or "the max timestamp > > after > > > I > > > > >> > appended this message set is T". The former case means that we > can > > > > skip > > > > >> all > > > > >> > the previous messages if we are looking for a timestamp > T and > > > start > > > > >> from > > > > >> > this offset. The latter one means if we are searching for > > timestamp > > > > > > > > >> T, we > > > > >> > should start after this message set, which is essentially the > same > > > as > > > > >> the > > > > >> > former case but require an additional interpretation. > > > > >> > > > > > >> > 3. We say "The default initial / max size of the time index > files > > is > > > > the > > > > >> > > same as the offset index files. (time index entry is 1.5x of > the > > > > size > > > > >> of > > > > >> > > offset index entry, user should set the configuration > > > accordingly)". > > > > >> It > > > > >> > may > > > > >> > > be worth elaborating a little on what a user should do with > > > regards > > > > to > > > > >> > this > > > > >> > > configuration when upgrading (ie maybe under "Compatibility, > > > > >> Deprecation, > > > > >> > > and Migration Plan"). > > > > >> > > > > > > >> > Makes sense. > > > > >> > > > > > >> > > > > > >> > > 4. In a previous vote thread, Jun said "The simplest thing is > > > > probably > > > > >> > > to change > > > > >> > > the default index size to 2MB to match the default log segment > > > size" > > > > >> and > > > > >> > > you seemed to agree. I couldn't find anything about this in > the > > > KIP. > > > > >> Are > > > > >> > we > > > > >> > > still doing it? > > > > >> > > > > > > >> > Yes, we can still make the change for default settings. User > might > > > > want > > > > >> to > > > > >> > set the index size a little larger if they have a customized > size > > > but > > > > in > > > > >> > reality it should not cause problems other than rolling out a > > little > > > > >> more > > > > >> > log segments. > > > > >> > > > > > >> > 5. We say "Instead, it is only monotonically increasing within > > each > > > > time > > > > >> > > index file. i.e. It is possible that the time index file for a > > > later > > > > >> log > > > > >> > > segment contains smaller timestamp than some timestamp in the > > time > > > > >> index > > > > >> > > file of an earlier segment.". I think it would be good to > > explain > > > > >> under > > > > >> > > which scenario a time index file for a later log segment > > contains > > > a > > > > >> > smaller > > > > >> > > timestamp (is this only when CreateTime is used?). > > > > >> > > > > > > >> > Yes, it only happens when CreateTime is used. > > > > >> > > > > > >> > > > > > >> > > 6. We say "When searching by timestamp, broker will start from > > the > > > > >> > earliest > > > > >> > > log segment and check the last time index entry.". The > existing > > > > logic > > > > >> > > searches from newest segment backwards. Is there a reason why > we > > > are > > > > >> > > changing it? > > > > >> > > > > > > >> > Suppose segment 0 has max timestamp 100, segment 1 has max > > timestamp > > > > 50 > > > > >> and > > > > >> > segment 3 has max timestamp 90, now user want to search for > > > timestamp > > > > >> 80. > > > > >> > If we search backwards, we have to take a look at all the > > segments. > > > If > > > > >> we > > > > >> > search forward, we will stop at the first segment whose max > > > timestamp > > > > is > > > > >> > greater than 80 (i.e all the previous segments has smaller > > > timestamps) > > > > >> and > > > > >> > start the finer search on that segment. > > > > >> > > > > > >> > > > > > >> > > 7. Do you mind if I fix typos and minor grammar issues > directly > > in > > > > the > > > > >> > > wiki? It seems easier than doing that via email. > > > > >> > > > > > > >> > Not at all, thanks for help. > > > > >> > > > > > >> > > > > > >> > > Thanks, > > > > >> > > Ismael > > > > >> > > > > > > >> > > On Thu, Apr 7, 2016 at 1:44 AM, Becket Qin < > > becket....@gmail.com> > > > > >> wrote: > > > > >> > > > > > > >> > > > Hi all, > > > > >> > > > > > > > >> > > > I updated KIP-33 based on the initial implementation. Per > > > > >> discussion on > > > > >> > > > yesterday's KIP hangout, I would like to initiate the new > vote > > > > >> thread > > > > >> > for > > > > >> > > > KIP-33. > > > > >> > > > > > > > >> > > > The KIP wiki: > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 33+-+Add+a+time+based+log+index > > > > >> > > > > > > > >> > > > Here is a brief summary of the KIP: > > > > >> > > > 1. We propose to add a time index for each log segment. > > > > >> > > > 2. The time indices are going to be used of log retention, > log > > > > >> rolling > > > > >> > > and > > > > >> > > > message search by timestamp. > > > > >> > > > > > > > >> > > > There was an old voting thread which has some discussions on > > > this > > > > >> KIP. > > > > >> > > The > > > > >> > > > mail thread link is following: > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > http://mail-archives.apache.org/mod_mbox/kafka-dev/201602.mbox/% > > > 3ccabtagwgoebukyapfpchmycjk2tepq3ngtuwnhtr2tjvsnc8...@mail.gmail.com > %3E > > > > >> > > > > > > > >> > > > I have the following WIP patch for reference. It needs a few > > > more > > > > >> unit > > > > >> > > > tests and documentation. Other than that it should run fine. > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > https://github.com/becketqin/kafka/commit/ > > 712357a3fbf1423e05f9eed7d2fed5 > > > b6fe6c37b7 > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > > > > > >> > > > Jiangjie (Becket) Qin > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >