Brett,

Yes, I will post PR tomorrow.

Xiongqi (Wesley) Wu


On Sun, Sep 2, 2018 at 6:28 PM Brett Rann <br...@zendesk.com.invalid> wrote:

> +1 (non-binding) from me on the interface. I'd like to see someone familiar
> with
> the code comment on the approach, and note there's a couple of different
> approaches: what's documented in the KIP, and what Xiaohe Dong was working
> on
> here:
>
> https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0
>
> If you have code working already Xiongqi Wu could you share a PR? I'd be
> happy
> to start testing.
>
> On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu <xiongq...@gmail.com> wrote:
>
> > Hi All,
> >
> > Do you have any additional comments on this KIP?
> >
> >
> > On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu <xiongq...@gmail.com> wrote:
> >
> > > on 2)
> > > The offsetmap is built starting from dirty segment.
> > > The compaction starts from the beginning of the log partition. That's
> how
> > > it ensure the deletion of tomb keys.
> > > I will double check tomorrow.
> > >
> > > Xiongqi (Wesley) Wu
> > >
> > >
> > > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann <br...@zendesk.com.invalid>
> > > wrote:
> > >
> > >> To just clarify a bit on 1. whether there's an external storage/DB
> isn't
> > >> relevant here.
> > >> Compacted topics allow a tombstone record to be sent (a null value
> for a
> > >> key) which
> > >> currently will result in old values for that key being deleted if some
> > >> conditions are met.
> > >> There are existing controls to make sure the old values will stay
> around
> > >> for a minimum
> > >> time at least, but no dedicated control to ensure the tombstone will
> > >> delete
> > >> within a
> > >> maximum time.
> > >>
> > >> One popular reason that maximum time for deletion is desirable right
> now
> > >> is
> > >> GDPR with
> > >> PII. But we're not proposing any GDPR awareness in kafka, just being
> > able
> > >> to guarantee
> > >> a max time where a tombstoned key will be removed from the compacted
> > >> topic.
> > >>
> > >> on 2)
> > >> huh, i thought it kept track of the first dirty segment and didn't
> > >> recompact older "clean" ones.
> > >> But I didn't look at code or test for that.
> > >>
> > >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu <xiongq...@gmail.com>
> > wrote:
> > >>
> > >> > 1, Owner of data (in this sense, kafka is the not the owner of data)
> > >> > should keep track of lifecycle of the data in some external
> > storage/DB.
> > >> > The owner determines when to delete the data and send the delete
> > >> request to
> > >> > kafka. Kafka doesn't know about the content of data but to provide a
> > >> mean
> > >> > for deletion.
> > >> >
> > >> > 2 , each time compaction runs, it will start from first segments (no
> > >> > matter if it is compacted or not). The time estimation here is only
> > used
> > >> > to determine whether we should run compaction on this log partition.
> > So
> > >> we
> > >> > only need to estimate uncompacted segments.
> > >> >
> > >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin <lindon...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hey Xiongqi,
> > >> > >
> > >> > > Thanks for the update. I have two questions for the latest KIP.
> > >> > >
> > >> > > 1) The motivation section says that one use case is to delete PII
> > >> > (Personal
> > >> > > Identifiable information) data within 7 days while keeping non-PII
> > >> > > indefinitely in compacted format. I suppose the use-case depends
> on
> > >> the
> > >> > > application to determine when to delete those PII data. Could you
> > >> explain
> > >> > > how can application reliably determine the set of keys that should
> > be
> > >> > > deleted? Is application required to always messages from the topic
> > >> after
> > >> > > every restart and determine the keys to be deleted by looking at
> > >> message
> > >> > > timestamp, or is application supposed to persist the key->
> timstamp
> > >> > > information in a separate persistent storage system?
> > >> > >
> > >> > > 2) It is mentioned in the KIP that "we only need to estimate
> > earliest
> > >> > > message timestamp for un-compacted log segments because the
> deletion
> > >> > > requests that belong to compacted segments have already been
> > >> processed".
> > >> > > Not sure if it is correct. If a segment is compacted before user
> > sends
> > >> > > message to delete a key in this segment, it seems that we still
> need
> > >> to
> > >> > > ensure that the segment will be compacted again within the given
> > time
> > >> > after
> > >> > > the deletion is requested, right?
> > >> > >
> > >> > > Thanks,
> > >> > > Dong
> > >> > >
> > >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu <xiongq...@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi Xiaohe,
> > >> > > >
> > >> > > > Quick note:
> > >> > > > 1) Use minimum of segment.ms and max.compaction.lag.ms
> > >> > > > <http://max.compaction.ms
> > <http://max.compaction.ms>
> > >> > <http://max.compaction.ms
> > <http://max.compaction.ms>>>
> > >> > > >
> > >> > > > 2) I am not sure if I get your second question. first, we have
> > >> jitter
> > >> > > when
> > >> > > > we roll the active segment. second, on each compaction, we
> compact
> > >> upto
> > >> > > > the offsetmap could allow. Those will not lead to perfect
> > compaction
> > >> > > storm
> > >> > > > overtime. In addition, I expect we are setting
> > >> max.compaction.lag.ms
> > >> > on
> > >> > > > the order of days.
> > >> > > >
> > >> > > > 3) I don't have access to the confluent community slack for
> now. I
> > >> am
> > >> > > > reachable via the google handle out.
> > >> > > > To avoid the double effort, here is my plan:
> > >> > > > a) Collect more feedback and feature requriement on the KIP.
> > >> > > > b) Wait unitl this KIP is approved.
> > >> > > > c) I will address any additional requirements in the
> > implementation.
> > >> > (My
> > >> > > > current implementation only complies to whatever described in
> the
> > >> KIP
> > >> > > now)
> > >> > > > d) I can share the code with the you and community see you want
> to
> > >> add
> > >> > > > anything.
> > >> > > > e) submission through committee
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG <
> > >> dannyriv...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Xiongqi
> > >> > > > >
> > >> > > > > Thanks for thinking about implementing this as well. :)
> > >> > > > >
> > >> > > > > I was thinking about using `segment.ms` to trigger the
> segment
> > >> roll.
> > >> > > > > Also, its value can be the largest time bias for the record
> > >> deletion.
> > >> > > For
> > >> > > > > example, if the `segment.ms` is 1 day and `max.compaction.ms`
> > is
> > >> 30
> > >> > > > days,
> > >> > > > > the compaction may happen around 31 days.
> > >> > > > >
> > >> > > > > For my curiosity, is there a way we can do some performance
> test
> > >> for
> > >> > > this
> > >> > > > > and any tools you can recommend. As you know, previously, it
> is
> > >> > cleaned
> > >> > > > up
> > >> > > > > by respecting dirty ratio, but now it may happen anytime if
> max
> > >> lag
> > >> > has
> > >> > > > > passed for each message. I wonder what would happen if clients
> > >> send
> > >> > > huge
> > >> > > > > amount of tombstone records at the same time.
> > >> > > > >
> > >> > > > > I am looking forward to have a quick chat with you to avoid
> > double
> > >> > > effort
> > >> > > > > on this. I am in confluent community slack during the work
> time.
> > >> My
> > >> > > name
> > >> > > > is
> > >> > > > > Xiaohe Dong. :)
> > >> > > > >
> > >> > > > > Rgds
> > >> > > > > Xiaohe Dong
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On 2018/08/16 01:22:22, xiongqi wu <xiongq...@gmail.com>
> wrote:
> > >> > > > > > Brett,
> > >> > > > > >
> > >> > > > > > Thank you for your comments.
> > >> > > > > > I was thinking since we already has immediate compaction
> > >> setting by
> > >> > > > > setting
> > >> > > > > > min dirty ratio to 0, so I decide to use "0" as disabled
> > state.
> > >> > > > > > I am ok to go with -1(disable), 0 (immediate) options.
> > >> > > > > >
> > >> > > > > > For the implementation, there are a few differences between
> > mine
> > >> > and
> > >> > > > > > "Xiaohe Dong"'s :
> > >> > > > > > 1) I used the estimated creation time of a log segment
> instead
> > >> of
> > >> > > > largest
> > >> > > > > > timestamp of a log to determine the compaction eligibility,
> > >> > because a
> > >> > > > log
> > >> > > > > > segment might stay as an active segment up to "max
> compaction
> > >> lag".
> > >> > > > (see
> > >> > > > > > the KIP for detail).
> > >> > > > > > 2) I measure how much bytes that we must clean to follow the
> > >> "max
> > >> > > > > > compaction lag" rule, and use that to determine the order of
> > >> > > > compaction.
> > >> > > > > > 3) force active segment to roll to follow the "max
> compaction
> > >> lag"
> > >> > > > > >
> > >> > > > > > I can share my code so we can coordinate.
> > >> > > > > >
> > >> > > > > > I haven't think about a new API to force a compaction. what
> is
> > >> the
> > >> > > use
> > >> > > > > case
> > >> > > > > > for this one?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Wed, Aug 15, 2018 at 5:33 PM, Brett Rann
> > >> > > <br...@zendesk.com.invalid
> > >> > > > >
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > We've been looking into this too.
> > >> > > > > > >
> > >> > > > > > > Mailing list:
> > >> > > > > > > https://lists.apache.org/thread.html/
> > <https://lists.apache.org/thread.html/>
> > >> > <https://lists.apache.org/thread.html/
> > <https://lists.apache.org/thread.html/>>
> > >> > > ed7f6a6589f94e8c2a705553f364ef
> > >> > > > > > > 599cb6915e4c3ba9b561e610e4@%3Cdev.kafka.apache.org%3E
> > >> > > > > > > jira wish:
> https://issues.apache.org/jira/browse/KAFKA-7137
> > <https://issues.apache.org/jira/browse/KAFKA-7137>
> > >> > <https://issues.apache.org/jira/browse/KAFKA-7137
> > <https://issues.apache.org/jira/browse/KAFKA-7137>>
> > >> > > > > > > confluent slack discussion:
> > >> > > > > > > https://confluentcommunity.slack.com/archives/C49R61XMM/
> > <https://confluentcommunity.slack.com/archives/C49R61XMM/>
> > >> > <https://confluentcommunity.slack.com/archives/C49R61XMM/
> > <https://confluentcommunity.slack.com/archives/C49R61XMM/>>
> > >> > > > > p1530760121000039
> > >> > > > > > >
> > >> > > > > > > A person on my team has started on code so you might want
> to
> > >> > > > > coordinate:
> > >> > > > > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
> > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->
> > >> > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
> > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->>
> > >> > > > > > > cleaner-compaction-max-lifetime-2.0
> > >> > > > > > >
> > >> > > > > > > He's been working with Jason Gustafson and James Chen
> around
> > >> the
> > >> > > > > changes.
> > >> > > > > > > You can ping him on confluent slack as Xiaohe Dong.
> > >> > > > > > >
> > >> > > > > > > It's great to know others are thinking on it as well.
> > >> > > > > > >
> > >> > > > > > > You've added the requirement to force a segment roll which
> > we
> > >> > > hadn't
> > >> > > > > gotten
> > >> > > > > > > to yet, which is great. I was content with it not
> including
> > >> the
> > >> > > > active
> > >> > > > > > > segment.
> > >> > > > > > >
> > >> > > > > > > > Adding topic level configuration "max.compaction.lag.ms
> ",
> > >> and
> > >> > > > > > > corresponding broker configuration "
> > >> > log.cleaner.max.compaction.la
> > >> > > > g.ms
> > >> > > > > ",
> > >> > > > > > > which is set to 0 (disabled) by default.
> > >> > > > > > >
> > >> > > > > > > Glancing at some other settings convention seems to me to
> be
> > >> -1
> > >> > for
> > >> > > > > > > disabled (or infinite, which is more meaningful here). 0
> to
> > me
> > >> > > > implies
> > >> > > > > > > instant, a little quicker than 1.
> > >> > > > > > >
> > >> > > > > > > We've been trying to think about a way to trigger
> compaction
> > >> as
> > >> > > well
> > >> > > > > > > through an API call, which would need to be flagged
> > somewhere
> > >> (ZK
> > >> > > > > admin/
> > >> > > > > > > space?) but we're struggling to think how that would be
> > >> > coordinated
> > >> > > > > across
> > >> > > > > > > brokers and partitions. Have you given any thought to
> that?
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Thu, Aug 16, 2018 at 8:44 AM xiongqi wu <
> > >> xiongq...@gmail.com>
> > >> > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Eno, Dong,
> > >> > > > > > > >
> > >> > > > > > > > I have updated the KIP. We decide not to address the
> issue
> > >> that
> > >> > > we
> > >> > > > > might
> > >> > > > > > > > have for both compaction and time retention enabled
> topics
> > >> (see
> > >> > > the
> > >> > > > > > > > rejected alternative item 2). This KIP will only ensure
> > log
> > >> can
> > >> > > be
> > >> > > > > > > > compacted after a specified time-interval.
> > >> > > > > > > >
> > >> > > > > > > > As suggested by Dong, we will also enforce "
> > >> > > max.compaction.lag.ms"
> > >> > > > > is
> > >> > > > > > > not
> > >> > > > > > > > less than "min.compaction.lag.ms".
> > >> > > > > > > >
> > >> > > > > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
> > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
> > >> > > > > Time-based
> > >> > > > > > > log
> > >> > > > > > > > compaction policy
> > >> > > > > > > > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
> > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
> > >> > > > > Time-based
> > >> > > > > > > log compaction policy>
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Tue, Aug 14, 2018 at 5:01 PM, xiongqi wu <
> > >> > xiongq...@gmail.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Per discussion with Dong, he made a very good point
> that
> > >> if
> > >> > > > > compaction
> > >> > > > > > > > > and time based retention are both enabled on a topic,
> > the
> > >> > > > > compaction
> > >> > > > > > > > might
> > >> > > > > > > > > prevent records from being deleted on time. The reason
> > is
> > >> > when
> > >> > > > > > > compacting
> > >> > > > > > > > > multiple segments into one single segment, the newly
> > >> created
> > >> > > > > segment
> > >> > > > > > > will
> > >> > > > > > > > > have same lastmodified timestamp as latest original
> > >> segment.
> > >> > We
> > >> > > > > lose
> > >> > > > > > > the
> > >> > > > > > > > > timestamp of all original segments except the last
> one.
> > >> As a
> > >> > > > > result,
> > >> > > > > > > > > records might not be deleted as it should be through
> > time
> > >> > based
> > >> > > > > > > > retention.
> > >> > > > > > > > >
> > >> > > > > > > > > With the current KIP proposal, if we want to ensure
> > timely
> > >> > > > > deletion, we
> > >> > > > > > > > > have the following configurations:
> > >> > > > > > > > > 1) enable time based log compaction only : deletion is
> > >> done
> > >> > > > though
> > >> > > > > > > > > overriding the same key
> > >> > > > > > > > > 2) enable time based log retention only: deletion is
> > done
> > >> > > though
> > >> > > > > > > > > time-based retention
> > >> > > > > > > > > 3) enable both log compaction and time based
> retention:
> > >> > > Deletion
> > >> > > > > is not
> > >> > > > > > > > > guaranteed.
> > >> > > > > > > > >
> > >> > > > > > > > > Not sure if we have use case 3 and also want deletion
> to
> > >> > happen
> > >> > > > on
> > >> > > > > > > time.
> > >> > > > > > > > > There are several options to address deletion issue
> when
> > >> > enable
> > >> > > > > both
> > >> > > > > > > > > compaction and retention:
> > >> > > > > > > > > A) During log compaction, looking into record
> timestamp
> > to
> > >> > > delete
> > >> > > > > > > expired
> > >> > > > > > > > > records. This can be done in compaction logic itself
> or
> > >> use
> > >> > > > > > > > > AdminClient.deleteRecords() . But this assumes we have
> > >> record
> > >> > > > > > > timestamp.
> > >> > > > > > > > > B) retain the lastModifed time of original segments
> > during
> > >> > log
> > >> > > > > > > > compaction.
> > >> > > > > > > > > This requires extra meta data to record the
> information
> > or
> > >> > not
> > >> > > > > grouping
> > >> > > > > > > > > multiple segments into one during compaction.
> > >> > > > > > > > >
> > >> > > > > > > > > If we have use case 3 in general, I would prefer
> > solution
> > >> A
> > >> > and
> > >> > > > > rely on
> > >> > > > > > > > > record timestamp.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Two questions:
> > >> > > > > > > > > Do we have use case 3? Is it nice to have or must
> have?
> > >> > > > > > > > > If we have use case 3 and want to go with solution A,
> > >> should
> > >> > we
> > >> > > > > > > introduce
> > >> > > > > > > > > a new configuration to enforce deletion by timestamp?
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Tue, Aug 14, 2018 at 1:52 PM, xiongqi wu <
> > >> > > xiongq...@gmail.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> Dong,
> > >> > > > > > > > >>
> > >> > > > > > > > >> Thanks for the comment.
> > >> > > > > > > > >>
> > >> > > > > > > > >> There are two retention policy: log compaction and
> time
> > >> > based
> > >> > > > > > > retention.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Log compaction:
> > >> > > > > > > > >>
> > >> > > > > > > > >> we have use cases to keep infinite retention of a
> topic
> > >> > (only
> > >> > > > > > > > >> compaction). GDPR cares about deletion of PII
> (personal
> > >> > > > > identifiable
> > >> > > > > > > > >> information) data.
> > >> > > > > > > > >> Since Kafka doesn't know what records contain PII, it
> > >> relies
> > >> > > on
> > >> > > > > upper
> > >> > > > > > > > >> layer to delete those records.
> > >> > > > > > > > >> For those infinite retention uses uses, kafka needs
> to
> > >> > > provide a
> > >> > > > > way
> > >> > > > > > > to
> > >> > > > > > > > >> enforce compaction on time. This is what we try to
> > >> address
> > >> > in
> > >> > > > this
> > >> > > > > > > KIP.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Time based retention,
> > >> > > > > > > > >>
> > >> > > > > > > > >> There are also use cases that users of Kafka might
> want
> > >> to
> > >> > > > expire
> > >> > > > > all
> > >> > > > > > > > >> their data.
> > >> > > > > > > > >> In those cases, they can use time based retention of
> > >> their
> > >> > > > topics.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Regarding your first question, if a user wants to
> > delete
> > >> a
> > >> > key
> > >> > > > in
> > >> > > > > the
> > >> > > > > > > > >> log compaction topic, the user has to send a deletion
> > >> using
> > >> > > the
> > >> > > > > same
> > >> > > > > > > > key.
> > >> > > > > > > > >> Kafka only makes sure the deletion will happen under
> a
> > >> > certain
> > >> > > > > time
> > >> > > > > > > > >> periods (like 2 days/7 days).
> > >> > > > > > > > >>
> > >> > > > > > > > >> Regarding your second question. In most cases, we
> might
> > >> want
> > >> > > to
> > >> > > > > delete
> > >> > > > > > > > >> all duplicated keys at the same time.
> > >> > > > > > > > >> Compaction might be more efficient since we need to
> > scan
> > >> the
> > >> > > log
> > >> > > > > and
> > >> > > > > > > > find
> > >> > > > > > > > >> all duplicates. However, the expected use case is to
> > set
> > >> the
> > >> > > > time
> > >> > > > > > > based
> > >> > > > > > > > >> compaction interval on the order of days, and be
> larger
> > >> than
> > >> > > > 'min
> > >> > > > > > > > >> compaction lag". We don't want log compaction to
> happen
> > >> > > > frequently
> > >> > > > > > > since
> > >> > > > > > > > >> it is expensive. The purpose is to help low
> production
> > >> rate
> > >> > > > topic
> > >> > > > > to
> > >> > > > > > > get
> > >> > > > > > > > >> compacted on time. For the topic with "normal"
> incoming
> > >> > > message
> > >> > > > > > > message
> > >> > > > > > > > >> rate, the "min dirty ratio" might have triggered the
> > >> > > compaction
> > >> > > > > before
> > >> > > > > > > > this
> > >> > > > > > > > >> time based compaction policy takes effect.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Eno,
> > >> > > > > > > > >>
> > >> > > > > > > > >> For your question, like I mentioned we have long time
> > >> > > retention
> > >> > > > > use
> > >> > > > > > > case
> > >> > > > > > > > >> for log compacted topic, but we want to provide
> ability
> > >> to
> > >> > > > delete
> > >> > > > > > > > certain
> > >> > > > > > > > >> PII records on time.
> > >> > > > > > > > >> Kafka itself doesn't know whether a record contains
> > >> > sensitive
> > >> > > > > > > > information
> > >> > > > > > > > >> and relies on the user for deletion.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin <
> > >> > > lindon...@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >>> Hey Xiongqi,
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Thanks for the KIP. I have two questions regarding
> the
> > >> > > use-case
> > >> > > > > for
> > >> > > > > > > > >>> meeting
> > >> > > > > > > > >>> GDPR requirement.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> 1) If I recall correctly, one of the GDPR
> requirement
> > is
> > >> > that
> > >> > > > we
> > >> > > > > can
> > >> > > > > > > > not
> > >> > > > > > > > >>> keep messages longer than e.g. 30 days in storage
> > (e.g.
> > >> > > Kafka).
> > >> > > > > Say
> > >> > > > > > > > there
> > >> > > > > > > > >>> exists a partition p0 which contains message1 with
> > key1
> > >> and
> > >> > > > > message2
> > >> > > > > > > > with
> > >> > > > > > > > >>> key2. And then user keeps producing messages with
> > >> key=key2
> > >> > to
> > >> > > > > this
> > >> > > > > > > > >>> partition. Since message1 with key1 is never
> > overridden,
> > >> > > sooner
> > >> > > > > or
> > >> > > > > > > > later
> > >> > > > > > > > >>> we
> > >> > > > > > > > >>> will want to delete message1 and keep the latest
> > message
> > >> > with
> > >> > > > > > > key=key2.
> > >> > > > > > > > >>> But
> > >> > > > > > > > >>> currently it looks like log compact logic in Kafka
> > will
> > >> > > always
> > >> > > > > put
> > >> > > > > > > > these
> > >> > > > > > > > >>> messages in the same segment. Will this be an issue?
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> 2) The current KIP intends to provide the capability
> > to
> > >> > > delete
> > >> > > > a
> > >> > > > > > > given
> > >> > > > > > > > >>> message in log compacted topic. Does such use-case
> > also
> > >> > > require
> > >> > > > > Kafka
> > >> > > > > > > > to
> > >> > > > > > > > >>> keep the messages produced before the given message?
> > If
> > >> > yes,
> > >> > > > > then we
> > >> > > > > > > > can
> > >> > > > > > > > >>> probably just use AdminClient.deleteRecords() or
> > >> time-based
> > >> > > log
> > >> > > > > > > > retention
> > >> > > > > > > > >>> to meet the use-case requirement. If no, do you know
> > >> what
> > >> > is
> > >> > > > the
> > >> > > > > > > GDPR's
> > >> > > > > > > > >>> requirement on time-to-deletion after user
> explicitly
> > >> > > requests
> > >> > > > > the
> > >> > > > > > > > >>> deletion
> > >> > > > > > > > >>> (e.g. 1 hour, 1 day, 7 day)?
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Thanks,
> > >> > > > > > > > >>> Dong
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu <
> > >> > > > xiongq...@gmail.com
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> > Hi Eno,
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> > The GDPR request we are getting here at linkedin
> is
> > >> if we
> > >> > > > get a
> > >> > > > > > > > >>> request to
> > >> > > > > > > > >>> > delete a record through a null key on a log
> > compacted
> > >> > > topic,
> > >> > > > > > > > >>> > we want to delete the record via compaction in a
> > given
> > >> > time
> > >> > > > > period
> > >> > > > > > > > >>> like 2
> > >> > > > > > > > >>> > days (whatever is required by the policy).
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> > There might be other issues (such as orphan log
> > >> segments
> > >> > > > under
> > >> > > > > > > > certain
> > >> > > > > > > > >>> > conditions) that lead to GDPR problem but they are
> > >> more
> > >> > > like
> > >> > > > > > > > >>> something we
> > >> > > > > > > > >>> > need to fix anyway regardless of GDPR.
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> > -- Xiongqi (Wesley) Wu
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> > On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska <
> > >> > > > > > > > eno.there...@gmail.com>
> > >> > > > > > > > >>> > wrote:
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>> > > Hello,
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > Thanks for the KIP. I'd like to see a more
> precise
> > >> > > > > definition of
> > >> > > > > > > > what
> > >> > > > > > > > >>> > part
> > >> > > > > > > > >>> > > of GDPR you are targeting as well as some sort
> of
> > >> > > > > verification
> > >> > > > > > > that
> > >> > > > > > > > >>> this
> > >> > > > > > > > >>> > > KIP actually addresses the problem. Right now I
> > find
> > >> > > this a
> > >> > > > > bit
> > >> > > > > > > > >>> vague:
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > "Ability to delete a log message through
> > compaction
> > >> in
> > >> > a
> > >> > > > > timely
> > >> > > > > > > > >>> manner
> > >> > > > > > > > >>> > has
> > >> > > > > > > > >>> > > become an important requirement in some use
> cases
> > >> > (e.g.,
> > >> > > > > GDPR)"
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > Is there any guarantee that after this KIP the
> > GDPR
> > >> > > problem
> > >> > > > > is
> > >> > > > > > > > >>> solved or
> > >> > > > > > > > >>> > do
> > >> > > > > > > > >>> > > we need to do something else as well, e.g., more
> > >> KIPs?
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > Thanks
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > Eno
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu <
> > >> > > > > xiongq...@gmail.com>
> > >> > > > > > > > >>> wrote:
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> > > > Hi Kafka,
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > > > This KIP tries to address GDPR concern to
> > fulfill
> > >> > > > deletion
> > >> > > > > > > > request
> > >> > > > > > > > >>> on
> > >> > > > > > > > >>> > > time
> > >> > > > > > > > >>> > > > through time-based log compaction on a
> > compaction
> > >> > > enabled
> > >> > > > > > > topic:
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > > >
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
> > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>
> > >> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
> > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>>
> > >> > > > > > > > >>> > > > 354%3A+Time-based+log+compaction+policy
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > > > Any feedback will be appreciated.
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > > > Xiongqi (Wesley) Wu
> > >> > > > > > > > >>> > > >
> > >> > > > > > > > >>> > >
> > >> > > > > > > > >>> >
> > >> > > > > > > > >>>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> --
> > >> > > > > > > > >> Xiongqi (Wesley) Wu
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > Xiongqi (Wesley) Wu
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > Xiongqi (Wesley) Wu
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Brett Rann
> > >> > > > > > >
> > >> > > > > > > Senior DevOps Engineer
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Zendesk International Ltd
> > >> > > > > > >
> > >> > > > > > > 395 Collins Street, Melbourne VIC 3000 Australia
> > >> > > > > > >
> > >> > > > > > > Mobile: +61 (0) 418 826 017
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Xiongqi (Wesley) Wu
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Xiongqi (Wesley) Wu
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Xiongqi (Wesley) Wu
> > >> >
> > >>
> > >>
> > >> --
> > >>
> > >> Brett Rann
> > >>
> > >> Senior DevOps Engineer
> > >>
> > >>
> > >> Zendesk International Ltd
> > >>
> > >> 395 Collins Street, Melbourne VIC 3000 Australia
> > >>
> > >> Mobile: +61 (0) 418 826 017
> > >>
> > >
> >
> >
> > --
> > Xiongqi (Wesley) Wu
> >
>
>
> --
>
> Brett Rann
>
> Senior DevOps Engineer
>
>
> Zendesk International Ltd
>
> 395 Collins Street, Melbourne VIC 3000 Australia
>
> Mobile: +61 (0) 418 826 017
>

Reply via email to