+1 (non-binding) from me on the interface. I'd like to see someone familiar
with
the code comment on the approach, and note there's a couple of different
approaches: what's documented in the KIP, and what Xiaohe Dong was working
on
here:
https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0

If you have code working already Xiongqi Wu could you share a PR? I'd be
happy
to start testing.

On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu <xiongq...@gmail.com> wrote:

> Hi All,
>
> Do you have any additional comments on this KIP?
>
>
> On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu <xiongq...@gmail.com> wrote:
>
> > on 2)
> > The offsetmap is built starting from dirty segment.
> > The compaction starts from the beginning of the log partition. That's how
> > it ensure the deletion of tomb keys.
> > I will double check tomorrow.
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann <br...@zendesk.com.invalid>
> > wrote:
> >
> >> To just clarify a bit on 1. whether there's an external storage/DB isn't
> >> relevant here.
> >> Compacted topics allow a tombstone record to be sent (a null value for a
> >> key) which
> >> currently will result in old values for that key being deleted if some
> >> conditions are met.
> >> There are existing controls to make sure the old values will stay around
> >> for a minimum
> >> time at least, but no dedicated control to ensure the tombstone will
> >> delete
> >> within a
> >> maximum time.
> >>
> >> One popular reason that maximum time for deletion is desirable right now
> >> is
> >> GDPR with
> >> PII. But we're not proposing any GDPR awareness in kafka, just being
> able
> >> to guarantee
> >> a max time where a tombstoned key will be removed from the compacted
> >> topic.
> >>
> >> on 2)
> >> huh, i thought it kept track of the first dirty segment and didn't
> >> recompact older "clean" ones.
> >> But I didn't look at code or test for that.
> >>
> >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu <xiongq...@gmail.com>
> wrote:
> >>
> >> > 1, Owner of data (in this sense, kafka is the not the owner of data)
> >> > should keep track of lifecycle of the data in some external
> storage/DB.
> >> > The owner determines when to delete the data and send the delete
> >> request to
> >> > kafka. Kafka doesn't know about the content of data but to provide a
> >> mean
> >> > for deletion.
> >> >
> >> > 2 , each time compaction runs, it will start from first segments (no
> >> > matter if it is compacted or not). The time estimation here is only
> used
> >> > to determine whether we should run compaction on this log partition.
> So
> >> we
> >> > only need to estimate uncompacted segments.
> >> >
> >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin <lindon...@gmail.com>
> wrote:
> >> >
> >> > > Hey Xiongqi,
> >> > >
> >> > > Thanks for the update. I have two questions for the latest KIP.
> >> > >
> >> > > 1) The motivation section says that one use case is to delete PII
> >> > (Personal
> >> > > Identifiable information) data within 7 days while keeping non-PII
> >> > > indefinitely in compacted format. I suppose the use-case depends on
> >> the
> >> > > application to determine when to delete those PII data. Could you
> >> explain
> >> > > how can application reliably determine the set of keys that should
> be
> >> > > deleted? Is application required to always messages from the topic
> >> after
> >> > > every restart and determine the keys to be deleted by looking at
> >> message
> >> > > timestamp, or is application supposed to persist the key-> timstamp
> >> > > information in a separate persistent storage system?
> >> > >
> >> > > 2) It is mentioned in the KIP that "we only need to estimate
> earliest
> >> > > message timestamp for un-compacted log segments because the deletion
> >> > > requests that belong to compacted segments have already been
> >> processed".
> >> > > Not sure if it is correct. If a segment is compacted before user
> sends
> >> > > message to delete a key in this segment, it seems that we still need
> >> to
> >> > > ensure that the segment will be compacted again within the given
> time
> >> > after
> >> > > the deletion is requested, right?
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu <xiongq...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Xiaohe,
> >> > > >
> >> > > > Quick note:
> >> > > > 1) Use minimum of segment.ms and max.compaction.lag.ms
> >> > > > <http://max.compaction.ms
> <http://max.compaction.ms>
> >> > <http://max.compaction.ms
> <http://max.compaction.ms>>>
> >> > > >
> >> > > > 2) I am not sure if I get your second question. first, we have
> >> jitter
> >> > > when
> >> > > > we roll the active segment. second, on each compaction, we compact
> >> upto
> >> > > > the offsetmap could allow. Those will not lead to perfect
> compaction
> >> > > storm
> >> > > > overtime. In addition, I expect we are setting
> >> max.compaction.lag.ms
> >> > on
> >> > > > the order of days.
> >> > > >
> >> > > > 3) I don't have access to the confluent community slack for now. I
> >> am
> >> > > > reachable via the google handle out.
> >> > > > To avoid the double effort, here is my plan:
> >> > > > a) Collect more feedback and feature requriement on the KIP.
> >> > > > b) Wait unitl this KIP is approved.
> >> > > > c) I will address any additional requirements in the
> implementation.
> >> > (My
> >> > > > current implementation only complies to whatever described in the
> >> KIP
> >> > > now)
> >> > > > d) I can share the code with the you and community see you want to
> >> add
> >> > > > anything.
> >> > > > e) submission through committee
> >> > > >
> >> > > >
> >> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG <
> >> dannyriv...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Xiongqi
> >> > > > >
> >> > > > > Thanks for thinking about implementing this as well. :)
> >> > > > >
> >> > > > > I was thinking about using `segment.ms` to trigger the segment
> >> roll.
> >> > > > > Also, its value can be the largest time bias for the record
> >> deletion.
> >> > > For
> >> > > > > example, if the `segment.ms` is 1 day and `max.compaction.ms`
> is
> >> 30
> >> > > > days,
> >> > > > > the compaction may happen around 31 days.
> >> > > > >
> >> > > > > For my curiosity, is there a way we can do some performance test
> >> for
> >> > > this
> >> > > > > and any tools you can recommend. As you know, previously, it is
> >> > cleaned
> >> > > > up
> >> > > > > by respecting dirty ratio, but now it may happen anytime if max
> >> lag
> >> > has
> >> > > > > passed for each message. I wonder what would happen if clients
> >> send
> >> > > huge
> >> > > > > amount of tombstone records at the same time.
> >> > > > >
> >> > > > > I am looking forward to have a quick chat with you to avoid
> double
> >> > > effort
> >> > > > > on this. I am in confluent community slack during the work time.
> >> My
> >> > > name
> >> > > > is
> >> > > > > Xiaohe Dong. :)
> >> > > > >
> >> > > > > Rgds
> >> > > > > Xiaohe Dong
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On 2018/08/16 01:22:22, xiongqi wu <xiongq...@gmail.com> wrote:
> >> > > > > > Brett,
> >> > > > > >
> >> > > > > > Thank you for your comments.
> >> > > > > > I was thinking since we already has immediate compaction
> >> setting by
> >> > > > > setting
> >> > > > > > min dirty ratio to 0, so I decide to use "0" as disabled
> state.
> >> > > > > > I am ok to go with -1(disable), 0 (immediate) options.
> >> > > > > >
> >> > > > > > For the implementation, there are a few differences between
> mine
> >> > and
> >> > > > > > "Xiaohe Dong"'s :
> >> > > > > > 1) I used the estimated creation time of a log segment instead
> >> of
> >> > > > largest
> >> > > > > > timestamp of a log to determine the compaction eligibility,
> >> > because a
> >> > > > log
> >> > > > > > segment might stay as an active segment up to "max compaction
> >> lag".
> >> > > > (see
> >> > > > > > the KIP for detail).
> >> > > > > > 2) I measure how much bytes that we must clean to follow the
> >> "max
> >> > > > > > compaction lag" rule, and use that to determine the order of
> >> > > > compaction.
> >> > > > > > 3) force active segment to roll to follow the "max compaction
> >> lag"
> >> > > > > >
> >> > > > > > I can share my code so we can coordinate.
> >> > > > > >
> >> > > > > > I haven't think about a new API to force a compaction. what is
> >> the
> >> > > use
> >> > > > > case
> >> > > > > > for this one?
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Aug 15, 2018 at 5:33 PM, Brett Rann
> >> > > <br...@zendesk.com.invalid
> >> > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > We've been looking into this too.
> >> > > > > > >
> >> > > > > > > Mailing list:
> >> > > > > > > https://lists.apache.org/thread.html/
> <https://lists.apache.org/thread.html/>
> >> > <https://lists.apache.org/thread.html/
> <https://lists.apache.org/thread.html/>>
> >> > > ed7f6a6589f94e8c2a705553f364ef
> >> > > > > > > 599cb6915e4c3ba9b561e610e4@%3Cdev.kafka.apache.org%3E
> >> > > > > > > jira wish: https://issues.apache.org/jira/browse/KAFKA-7137
> <https://issues.apache.org/jira/browse/KAFKA-7137>
> >> > <https://issues.apache.org/jira/browse/KAFKA-7137
> <https://issues.apache.org/jira/browse/KAFKA-7137>>
> >> > > > > > > confluent slack discussion:
> >> > > > > > > https://confluentcommunity.slack.com/archives/C49R61XMM/
> <https://confluentcommunity.slack.com/archives/C49R61XMM/>
> >> > <https://confluentcommunity.slack.com/archives/C49R61XMM/
> <https://confluentcommunity.slack.com/archives/C49R61XMM/>>
> >> > > > > p1530760121000039
> >> > > > > > >
> >> > > > > > > A person on my team has started on code so you might want to
> >> > > > > coordinate:
> >> > > > > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
> <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->
> >> > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
> <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->>
> >> > > > > > > cleaner-compaction-max-lifetime-2.0
> >> > > > > > >
> >> > > > > > > He's been working with Jason Gustafson and James Chen around
> >> the
> >> > > > > changes.
> >> > > > > > > You can ping him on confluent slack as Xiaohe Dong.
> >> > > > > > >
> >> > > > > > > It's great to know others are thinking on it as well.
> >> > > > > > >
> >> > > > > > > You've added the requirement to force a segment roll which
> we
> >> > > hadn't
> >> > > > > gotten
> >> > > > > > > to yet, which is great. I was content with it not including
> >> the
> >> > > > active
> >> > > > > > > segment.
> >> > > > > > >
> >> > > > > > > > Adding topic level configuration "max.compaction.lag.ms",
> >> and
> >> > > > > > > corresponding broker configuration "
> >> > log.cleaner.max.compaction.la
> >> > > > g.ms
> >> > > > > ",
> >> > > > > > > which is set to 0 (disabled) by default.
> >> > > > > > >
> >> > > > > > > Glancing at some other settings convention seems to me to be
> >> -1
> >> > for
> >> > > > > > > disabled (or infinite, which is more meaningful here). 0 to
> me
> >> > > > implies
> >> > > > > > > instant, a little quicker than 1.
> >> > > > > > >
> >> > > > > > > We've been trying to think about a way to trigger compaction
> >> as
> >> > > well
> >> > > > > > > through an API call, which would need to be flagged
> somewhere
> >> (ZK
> >> > > > > admin/
> >> > > > > > > space?) but we're struggling to think how that would be
> >> > coordinated
> >> > > > > across
> >> > > > > > > brokers and partitions. Have you given any thought to that?
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Thu, Aug 16, 2018 at 8:44 AM xiongqi wu <
> >> xiongq...@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Eno, Dong,
> >> > > > > > > >
> >> > > > > > > > I have updated the KIP. We decide not to address the issue
> >> that
> >> > > we
> >> > > > > might
> >> > > > > > > > have for both compaction and time retention enabled topics
> >> (see
> >> > > the
> >> > > > > > > > rejected alternative item 2). This KIP will only ensure
> log
> >> can
> >> > > be
> >> > > > > > > > compacted after a specified time-interval.
> >> > > > > > > >
> >> > > > > > > > As suggested by Dong, we will also enforce "
> >> > > max.compaction.lag.ms"
> >> > > > > is
> >> > > > > > > not
> >> > > > > > > > less than "min.compaction.lag.ms".
> >> > > > > > > >
> >> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
> >> > > > > Time-based
> >> > > > > > > log
> >> > > > > > > > compaction policy
> >> > > > > > > > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
> >> > > > > Time-based
> >> > > > > > > log compaction policy>
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Tue, Aug 14, 2018 at 5:01 PM, xiongqi wu <
> >> > xiongq...@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Per discussion with Dong, he made a very good point that
> >> if
> >> > > > > compaction
> >> > > > > > > > > and time based retention are both enabled on a topic,
> the
> >> > > > > compaction
> >> > > > > > > > might
> >> > > > > > > > > prevent records from being deleted on time. The reason
> is
> >> > when
> >> > > > > > > compacting
> >> > > > > > > > > multiple segments into one single segment, the newly
> >> created
> >> > > > > segment
> >> > > > > > > will
> >> > > > > > > > > have same lastmodified timestamp as latest original
> >> segment.
> >> > We
> >> > > > > lose
> >> > > > > > > the
> >> > > > > > > > > timestamp of all original segments except the last one.
> >> As a
> >> > > > > result,
> >> > > > > > > > > records might not be deleted as it should be through
> time
> >> > based
> >> > > > > > > > retention.
> >> > > > > > > > >
> >> > > > > > > > > With the current KIP proposal, if we want to ensure
> timely
> >> > > > > deletion, we
> >> > > > > > > > > have the following configurations:
> >> > > > > > > > > 1) enable time based log compaction only : deletion is
> >> done
> >> > > > though
> >> > > > > > > > > overriding the same key
> >> > > > > > > > > 2) enable time based log retention only: deletion is
> done
> >> > > though
> >> > > > > > > > > time-based retention
> >> > > > > > > > > 3) enable both log compaction and time based retention:
> >> > > Deletion
> >> > > > > is not
> >> > > > > > > > > guaranteed.
> >> > > > > > > > >
> >> > > > > > > > > Not sure if we have use case 3 and also want deletion to
> >> > happen
> >> > > > on
> >> > > > > > > time.
> >> > > > > > > > > There are several options to address deletion issue when
> >> > enable
> >> > > > > both
> >> > > > > > > > > compaction and retention:
> >> > > > > > > > > A) During log compaction, looking into record timestamp
> to
> >> > > delete
> >> > > > > > > expired
> >> > > > > > > > > records. This can be done in compaction logic itself or
> >> use
> >> > > > > > > > > AdminClient.deleteRecords() . But this assumes we have
> >> record
> >> > > > > > > timestamp.
> >> > > > > > > > > B) retain the lastModifed time of original segments
> during
> >> > log
> >> > > > > > > > compaction.
> >> > > > > > > > > This requires extra meta data to record the information
> or
> >> > not
> >> > > > > grouping
> >> > > > > > > > > multiple segments into one during compaction.
> >> > > > > > > > >
> >> > > > > > > > > If we have use case 3 in general, I would prefer
> solution
> >> A
> >> > and
> >> > > > > rely on
> >> > > > > > > > > record timestamp.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Two questions:
> >> > > > > > > > > Do we have use case 3? Is it nice to have or must have?
> >> > > > > > > > > If we have use case 3 and want to go with solution A,
> >> should
> >> > we
> >> > > > > > > introduce
> >> > > > > > > > > a new configuration to enforce deletion by timestamp?
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Aug 14, 2018 at 1:52 PM, xiongqi wu <
> >> > > xiongq...@gmail.com
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> Dong,
> >> > > > > > > > >>
> >> > > > > > > > >> Thanks for the comment.
> >> > > > > > > > >>
> >> > > > > > > > >> There are two retention policy: log compaction and time
> >> > based
> >> > > > > > > retention.
> >> > > > > > > > >>
> >> > > > > > > > >> Log compaction:
> >> > > > > > > > >>
> >> > > > > > > > >> we have use cases to keep infinite retention of a topic
> >> > (only
> >> > > > > > > > >> compaction). GDPR cares about deletion of PII (personal
> >> > > > > identifiable
> >> > > > > > > > >> information) data.
> >> > > > > > > > >> Since Kafka doesn't know what records contain PII, it
> >> relies
> >> > > on
> >> > > > > upper
> >> > > > > > > > >> layer to delete those records.
> >> > > > > > > > >> For those infinite retention uses uses, kafka needs to
> >> > > provide a
> >> > > > > way
> >> > > > > > > to
> >> > > > > > > > >> enforce compaction on time. This is what we try to
> >> address
> >> > in
> >> > > > this
> >> > > > > > > KIP.
> >> > > > > > > > >>
> >> > > > > > > > >> Time based retention,
> >> > > > > > > > >>
> >> > > > > > > > >> There are also use cases that users of Kafka might want
> >> to
> >> > > > expire
> >> > > > > all
> >> > > > > > > > >> their data.
> >> > > > > > > > >> In those cases, they can use time based retention of
> >> their
> >> > > > topics.
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> Regarding your first question, if a user wants to
> delete
> >> a
> >> > key
> >> > > > in
> >> > > > > the
> >> > > > > > > > >> log compaction topic, the user has to send a deletion
> >> using
> >> > > the
> >> > > > > same
> >> > > > > > > > key.
> >> > > > > > > > >> Kafka only makes sure the deletion will happen under a
> >> > certain
> >> > > > > time
> >> > > > > > > > >> periods (like 2 days/7 days).
> >> > > > > > > > >>
> >> > > > > > > > >> Regarding your second question. In most cases, we might
> >> want
> >> > > to
> >> > > > > delete
> >> > > > > > > > >> all duplicated keys at the same time.
> >> > > > > > > > >> Compaction might be more efficient since we need to
> scan
> >> the
> >> > > log
> >> > > > > and
> >> > > > > > > > find
> >> > > > > > > > >> all duplicates. However, the expected use case is to
> set
> >> the
> >> > > > time
> >> > > > > > > based
> >> > > > > > > > >> compaction interval on the order of days, and be larger
> >> than
> >> > > > 'min
> >> > > > > > > > >> compaction lag". We don't want log compaction to happen
> >> > > > frequently
> >> > > > > > > since
> >> > > > > > > > >> it is expensive. The purpose is to help low production
> >> rate
> >> > > > topic
> >> > > > > to
> >> > > > > > > get
> >> > > > > > > > >> compacted on time. For the topic with "normal" incoming
> >> > > message
> >> > > > > > > message
> >> > > > > > > > >> rate, the "min dirty ratio" might have triggered the
> >> > > compaction
> >> > > > > before
> >> > > > > > > > this
> >> > > > > > > > >> time based compaction policy takes effect.
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> Eno,
> >> > > > > > > > >>
> >> > > > > > > > >> For your question, like I mentioned we have long time
> >> > > retention
> >> > > > > use
> >> > > > > > > case
> >> > > > > > > > >> for log compacted topic, but we want to provide ability
> >> to
> >> > > > delete
> >> > > > > > > > certain
> >> > > > > > > > >> PII records on time.
> >> > > > > > > > >> Kafka itself doesn't know whether a record contains
> >> > sensitive
> >> > > > > > > > information
> >> > > > > > > > >> and relies on the user for deletion.
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin <
> >> > > lindon...@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > > > >>
> >> > > > > > > > >>> Hey Xiongqi,
> >> > > > > > > > >>>
> >> > > > > > > > >>> Thanks for the KIP. I have two questions regarding the
> >> > > use-case
> >> > > > > for
> >> > > > > > > > >>> meeting
> >> > > > > > > > >>> GDPR requirement.
> >> > > > > > > > >>>
> >> > > > > > > > >>> 1) If I recall correctly, one of the GDPR requirement
> is
> >> > that
> >> > > > we
> >> > > > > can
> >> > > > > > > > not
> >> > > > > > > > >>> keep messages longer than e.g. 30 days in storage
> (e.g.
> >> > > Kafka).
> >> > > > > Say
> >> > > > > > > > there
> >> > > > > > > > >>> exists a partition p0 which contains message1 with
> key1
> >> and
> >> > > > > message2
> >> > > > > > > > with
> >> > > > > > > > >>> key2. And then user keeps producing messages with
> >> key=key2
> >> > to
> >> > > > > this
> >> > > > > > > > >>> partition. Since message1 with key1 is never
> overridden,
> >> > > sooner
> >> > > > > or
> >> > > > > > > > later
> >> > > > > > > > >>> we
> >> > > > > > > > >>> will want to delete message1 and keep the latest
> message
> >> > with
> >> > > > > > > key=key2.
> >> > > > > > > > >>> But
> >> > > > > > > > >>> currently it looks like log compact logic in Kafka
> will
> >> > > always
> >> > > > > put
> >> > > > > > > > these
> >> > > > > > > > >>> messages in the same segment. Will this be an issue?
> >> > > > > > > > >>>
> >> > > > > > > > >>> 2) The current KIP intends to provide the capability
> to
> >> > > delete
> >> > > > a
> >> > > > > > > given
> >> > > > > > > > >>> message in log compacted topic. Does such use-case
> also
> >> > > require
> >> > > > > Kafka
> >> > > > > > > > to
> >> > > > > > > > >>> keep the messages produced before the given message?
> If
> >> > yes,
> >> > > > > then we
> >> > > > > > > > can
> >> > > > > > > > >>> probably just use AdminClient.deleteRecords() or
> >> time-based
> >> > > log
> >> > > > > > > > retention
> >> > > > > > > > >>> to meet the use-case requirement. If no, do you know
> >> what
> >> > is
> >> > > > the
> >> > > > > > > GDPR's
> >> > > > > > > > >>> requirement on time-to-deletion after user explicitly
> >> > > requests
> >> > > > > the
> >> > > > > > > > >>> deletion
> >> > > > > > > > >>> (e.g. 1 hour, 1 day, 7 day)?
> >> > > > > > > > >>>
> >> > > > > > > > >>> Thanks,
> >> > > > > > > > >>> Dong
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu <
> >> > > > xiongq...@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > >>>
> >> > > > > > > > >>> > Hi Eno,
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > The GDPR request we are getting here at linkedin is
> >> if we
> >> > > > get a
> >> > > > > > > > >>> request to
> >> > > > > > > > >>> > delete a record through a null key on a log
> compacted
> >> > > topic,
> >> > > > > > > > >>> > we want to delete the record via compaction in a
> given
> >> > time
> >> > > > > period
> >> > > > > > > > >>> like 2
> >> > > > > > > > >>> > days (whatever is required by the policy).
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > There might be other issues (such as orphan log
> >> segments
> >> > > > under
> >> > > > > > > > certain
> >> > > > > > > > >>> > conditions) that lead to GDPR problem but they are
> >> more
> >> > > like
> >> > > > > > > > >>> something we
> >> > > > > > > > >>> > need to fix anyway regardless of GDPR.
> >> > > > > > > > >>> >
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > -- Xiongqi (Wesley) Wu
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska <
> >> > > > > > > > eno.there...@gmail.com>
> >> > > > > > > > >>> > wrote:
> >> > > > > > > > >>> >
> >> > > > > > > > >>> > > Hello,
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > Thanks for the KIP. I'd like to see a more precise
> >> > > > > definition of
> >> > > > > > > > what
> >> > > > > > > > >>> > part
> >> > > > > > > > >>> > > of GDPR you are targeting as well as some sort of
> >> > > > > verification
> >> > > > > > > that
> >> > > > > > > > >>> this
> >> > > > > > > > >>> > > KIP actually addresses the problem. Right now I
> find
> >> > > this a
> >> > > > > bit
> >> > > > > > > > >>> vague:
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > "Ability to delete a log message through
> compaction
> >> in
> >> > a
> >> > > > > timely
> >> > > > > > > > >>> manner
> >> > > > > > > > >>> > has
> >> > > > > > > > >>> > > become an important requirement in some use cases
> >> > (e.g.,
> >> > > > > GDPR)"
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > Is there any guarantee that after this KIP the
> GDPR
> >> > > problem
> >> > > > > is
> >> > > > > > > > >>> solved or
> >> > > > > > > > >>> > do
> >> > > > > > > > >>> > > we need to do something else as well, e.g., more
> >> KIPs?
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > Thanks
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > Eno
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu <
> >> > > > > xiongq...@gmail.com>
> >> > > > > > > > >>> wrote:
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> > > > Hi Kafka,
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > This KIP tries to address GDPR concern to
> fulfill
> >> > > > deletion
> >> > > > > > > > request
> >> > > > > > > > >>> on
> >> > > > > > > > >>> > > time
> >> > > > > > > > >>> > > > through time-based log compaction on a
> compaction
> >> > > enabled
> >> > > > > > > topic:
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>
> >> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>>
> >> > > > > > > > >>> > > > 354%3A+Time-based+log+compaction+policy
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > Any feedback will be appreciated.
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > > > Xiongqi (Wesley) Wu
> >> > > > > > > > >>> > > >
> >> > > > > > > > >>> > >
> >> > > > > > > > >>> >
> >> > > > > > > > >>>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> --
> >> > > > > > > > >> Xiongqi (Wesley) Wu
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > Xiongqi (Wesley) Wu
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > Xiongqi (Wesley) Wu
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > >
> >> > > > > > > Brett Rann
> >> > > > > > >
> >> > > > > > > Senior DevOps Engineer
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Zendesk International Ltd
> >> > > > > > >
> >> > > > > > > 395 Collins Street, Melbourne VIC 3000 Australia
> >> > > > > > >
> >> > > > > > > Mobile: +61 (0) 418 826 017
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Xiongqi (Wesley) Wu
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Xiongqi (Wesley) Wu
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Xiongqi (Wesley) Wu
> >> >
> >>
> >>
> >> --
> >>
> >> Brett Rann
> >>
> >> Senior DevOps Engineer
> >>
> >>
> >> Zendesk International Ltd
> >>
> >> 395 Collins Street, Melbourne VIC 3000 Australia
> >>
> >> Mobile: +61 (0) 418 826 017
> >>
> >
>
>
> --
> Xiongqi (Wesley) Wu
>


-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia

Mobile: +61 (0) 418 826 017

Reply via email to