I'm having some trouble reconciling the current proposal with your original
requirement which was essentially being able to purge log data up to a
precise point (an offset). The KIP currently suggests that timestamp-based
deletion would only work with LogAppendTime, so it does not seem
significantly different from time-based retention (after KIP-32/33) - IOW
to me it appears that you would need to use CreateTime and not
LogAppendTime. Also one of the rejected alternatives observes that changing
the existing configuration settings to try to flush ranges of a given
partition's log are problematic, but it seems to me you would have to do
this in with timestamp-based deletion as well right? I think it would be
useful for me if you or anyone else can go over the exact
mechanics/workflow for accomplishing precise purges at today's KIP meeting.

Thanks,

Joel

On Monday, February 22, 2016, Bill Warshaw <wdwars...@gmail.com> wrote:

> Sounds good.  I'll hold off on sending out a VOTE thread until after the
> KIP meeting tomorrow.
>
> On Mon, Feb 22, 2016 at 12:56 PM, Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Jun,
> >
> > I think it makes sense to implement KIP-47 after KIP-33 so we can make it
> > work for both LogAppendTime and CreateTime.
> >
> > And yes, I'm actively working on KIP-33. I had a voting thread on KIP-33
> > before and I'll bump it up.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Mon, Feb 22, 2016 at 9:11 AM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Becket,
> > >
> > > Since you submitted KIP-33, are you actively working on that? If so, it
> > > would make sense to implement KIP-47 after KIP-33 so that it works for
> > both
> > > CreateTime and LogAppendTime.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > >
> > > On Fri, Feb 19, 2016 at 6:25 PM, Bill Warshaw <wdwars...@gmail.com>
> > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > 1.  I thought more about Andrew's comment about LogAppendTime.  The
> > > > time-based index you are referring to is associated with KIP-33,
> > correct?
> > > > Currently my implementation is just checking the last message in a
> > > segment,
> > > > so we're restricted to LogAppendTime.  When the work for KIP-33 is
> > > > completed, it sounds like CreateTime would also be valid.  Do you
> > happen
> > > to
> > > > know if anyone is currently working on KIP-33?
> > > >
> > > > 2. I did update the wiki after reading your original comment, but
> > reading
> > > > over it again I realize I could word a couple things more clearly.  I
> > > will
> > > > do that tonight.
> > > >
> > > > Bill
> > > >
> > > > On Fri, Feb 19, 2016 at 7:02 PM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > Hi, Bill,
> > > > >
> > > > > I replied with the following comments earlier to the thread. Did
> you
> > > see
> > > > > that?
> > > > >
> > > > > Thanks for the proposal. A couple of comments.
> > > > >
> > > > > 1. It seems that this new policy should work for CreateTime as
> well.
> > > If a
> > > > > topic is configured with CreateTime, messages may not be added in
> > > strict
> > > > > order in the log. However, to build a time-based index, we will be
> > > > > maintaining the largest timestamp for all messages in a log
> segment.
> > We
> > > > can
> > > > > delete a segment if its largest timestamp is less than
> > > > > log.retention.min.timestamp. This guarantees that no messages newer
> > > than
> > > > > log.retention.min.timestamp will be deleted, which is probably what
> > the
> > > > > user wants.
> > > > >
> > > > > 2. Right now, the user can specify "delete" as the retention policy
> > > and a
> > > > > log segment will be deleted either when the size of a partition
> > > exceeds a
> > > > > threshold or the timestamp of a segment is older than a relative
> > period
> > > > of
> > > > > time (say 7 days) from now. What you are proposing is not a new
> > > retention
> > > > > policy, but an additional check that will cause a segment to be
> > deleted
> > > > > when the timestamp of a segment is older than an absolute
> timestamp?
> > If
> > > > so,
> > > > > could you update the wiki accordingly?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw <wdwars...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > What is the next step with this proposal?  The work for KIP-32
> that
> > > it
> > > > > was
> > > > > > based off merged earlier today (
> > > > https://github.com/apache/kafka/pull/764
> > > > > ,
> > > > > > thank you Becket).  I have an implementation with tests, and I've
> > > > > confirmed
> > > > > > that it actually works in a live system.  Is there more
> discussion
> > > that
> > > > > > needs to be had about this KIP, or should I start a VOTE thread?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > > > Bill,
> > > > > > >
> > > > > > > Thanks for the proposal. A couple of comments.
> > > > > > >
> > > > > > > 1. It seems that this new policy should work for CreateTime as
> > > well.
> > > > > If a
> > > > > > > topic is configured with CreateTime, messages may not be added
> in
> > > > > strict
> > > > > > > order in the log. However, to build a time-based index, we will
> > be
> > > > > > > maintaining the largest timestamp for all messages in a log
> > > segment.
> > > > We
> > > > > > can
> > > > > > > delete a segment if its largest timestamp is less than
> > > > > > > log.retention.min.timestamp. This guarantees that no messages
> > newer
> > > > > than
> > > > > > > log.retention.min.timestamp will be deleted, which is probably
> > what
> > > > the
> > > > > > > user wants.
> > > > > > >
> > > > > > > 2. Right now, the user can specify "delete" as the retention
> > policy
> > > > > and a
> > > > > > > log segment will be deleted either when the size of a partition
> > > > > exceeds a
> > > > > > > threshold or the timestamp of a segment is older than a
> relative
> > > > period
> > > > > > of
> > > > > > > time (say 7 days) from now. What you are proposing is not a new
> > > > > retention
> > > > > > > policy, but an additional check that will cause a segment to be
> > > > deleted
> > > > > > > when the timestamp of a segment is older than an absolute
> > > timestamp?
> > > > If
> > > > > > so,
> > > > > > > could you update the wiki accordingly?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw <
> > wdwars...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > That is a good catch, thanks for pointing it out.  If this
> KIP
> > is
> > > > > > > accepted,
> > > > > > > > we'd need to document this and make the log cleaner not run
> > > > > > > timestamp-based
> > > > > > > > deletion unless message.timestamp.type=LogAppendTime.
> > > > > > > >
> > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield <
> > > > > > > > andrew_schofield_j...@outlook.com> wrote:
> > > > > > > >
> > > > > > > > > This KIP is related to KIP-32, but I strikes me that it
> only
> > > > makes
> > > > > > > sense
> > > > > > > > > with one of the two proposed message timestamp types. If I
> > > > > understand
> > > > > > > > > correctly, message timestamps are only certain to be
> > > > monotonically
> > > > > > > > > increasing in the log if
> > message.timestamp.type=LogAppendTime.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Does timestamp-based auto-expiration require use of
> > > > > > > > > message.timestamp.type=LogAppendTime?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think this KIP is a good idea, but I think it relies on
> > > strict
> > > > > > > ordering
> > > > > > > > > of timestamps to be workable.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Andrew Schofield
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800
> > > > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based log
> > > > deletion
> > > > > > > policy
> > > > > > > > > > From: n...@confluent.io
> > > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > >
> > > > > > > > > > Adding a timestamp based auto-expiration is useful and
> this
> > > > > > proposal
> > > > > > > > > makes
> > > > > > > > > > sense. Thx!
> > > > > > > > > >
> > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps  wrote:
> > > > > > > > > >
> > > > > > > > > >> I think this makes a lot of sense and won't be hard to
> > > > implement
> > > > > > and
> > > > > > > > > >> doesn't create too much in the way of new interfaces.
> > > > > > > > > >>
> > > > > > > > > >> -Jay
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw  wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Hello,
> > > > > > > > > >>>
> > > > > > > > > >>> I just submitted KIP-47 for adding a new log deletion
> > > policy
> > > > > > based
> > > > > > > > on a
> > > > > > > > > >>> minimum timestamp of messages to retain.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > > > >>>
> > > > > > > > > >>> I'm open to any comments or suggestions.
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>> Bill Warshaw
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Thanks,
> > > > > > > > > > Neha
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to