Hi Dong,
I have updated the KIP to address your comments.
One correction to previous Email:
after offline discussion with Dong,  we decide to use MAX_LONG as default
value for max.compaction.lag.ms.


Xiongqi (Wesley) Wu


On Mon, Oct 29, 2018 at 12:15 PM xiongqi wu <xiongq...@gmail.com> wrote:

> Hi Dong,
>
> Thank you for your comment.  See my inline comments.
> I will update the KIP shortly.
>
> Xiongqi (Wesley) Wu
>
>
> On Sun, Oct 28, 2018 at 9:17 PM Dong Lin <lindon...@gmail.com> wrote:
>
>> Hey Xiongqi,
>>
>> Sorry for late reply. I have some comments below:
>>
>> 1) As discussed earlier in the email list, if the topic is configured with
>> both deletion and compaction, in some cases messages produced a long time
>> ago can not be deleted based on time. This is a valid use-case because we
>> actually have topic which is configured with both deletion and compaction
>> policy. And we should enforce the semantics for both policy. Solution A
>> sounds good. We do not need interface change (e.g. extra config) to
>> enforce
>> solution A. All we need is to update implementation so that when broker
>> compacts a topic, if the message has timestamp (which is the common case),
>> messages that are too old (based on the time-based retention config) will
>> be discarded. Since this is a valid issue and it is also related to the
>> guarantee of when a message can be deleted, can we include the solution of
>> this problem in the KIP?
>>
> ======  This makes sense.  We can use similar approach to increase the log
> start offset.
>
>>
>> 2) It is probably OK to assume that all messages have timestamp. The
>> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and
>> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer
>> supported. Also, since the use-case for this feature is primarily for
>> GDPR,
>> we can assume that client library has already been upgraded to support
>> SSL,
>> which feature is added after KIP-31 and KIP-32.
>>
>>  =========>  Ok. We can use message timestamp to delete expired records
> if both compaction and retention are enabled.
>
>
> 3) In Proposed Change section 2.a, it is said that segment.largestTimestamp
>> - maxSegmentMs can be used to determine the timestamp of the earliest
>> message. Would it be simpler to just use the create time of the file to
>> determine the time?
>>
>> ========>  Linux/Java doesn't provide API for file creation time because
> some filesystem type doesn't provide file creation time.
>
>
>> 4) The KIP suggests to use must-clean-ratio to select the partition to be
>> compacted. Unlike dirty ratio which is mostly for performance, the logs
>> whose "must-clean-ratio" is non-zero must be compacted immediately for
>> correctness reason (and for GDPR). And if this can no be achieved because
>> e.g. broker compaction throughput is too low, investigation will be
>> needed.
>> So it seems simpler to first compact logs which has segment whose earliest
>> timetamp is earlier than now - max.compaction.lag.ms, instead of defining
>> must-clean-ratio and sorting logs based on this value.
>>
>>
> ======>  Good suggestion. This can simply the implementation quite a bit
> if we are not too concerned about compaction of GDPR required partition
> queued behind some large partition.  The actual compaction completion time
> is not guaranteed anyway.
>
>
>> 5) The KIP says max.compaction.lag.ms is 0 by default and it is also
>> suggested that 0 means disable. Should we set this value to MAX_LONG by
>> default to effectively disable the feature added in this KIP?
>>
>> ====> I would rather use 0 so the corresponding code path will not be
> exercised.  By using MAX_LONG, we would theoretically go through related
> code to find out whether the partition is required to be compacted to
> satisfy MAX_LONG.
>
> 6) It is probably cleaner and readable not to include in Public Interface
>> section those configs whose meaning is not changed.
>>
>> ====> I will clean that up.
>
> 7) The goal of this KIP is to ensure that log segment whose earliest
>> message is earlier than a given threshold will be compacted. This goal may
>> not be achieved if the compact throughput can not catchup with the total
>> bytes-in-rate for the compacted topics on the broker. Thus we need an easy
>> way to tell operator whether this goal is achieved. If we don't already
>> have such metric, maybe we can include metrics to show 1) the total number
>> of log segments (or logs) which needs to be immediately compacted as
>> determined by max.compaction.lag; and 2) the maximum value of now -
>> earliest_time_stamp_of_segment among all segments that needs to be
>> compacted.
>>
>> =======> good suggestion.  I will update KIP for these metrics.
>
> 8) The Performance Impact suggests user to use the existing metrics to
>> monitor the performance impact of this KIP. It i useful to list mean of
>> each jmx metrics that we want user to monitor, and possibly explain how to
>> interpret the value of these metrics to determine whether there is
>> performance issue.
>>
>> =========>  I will update the KIP.
>
>> Thanks,
>> Dong
>>
>> On Tue, Oct 16, 2018 at 10:53 AM xiongqi wu <xiongq...@gmail.com> wrote:
>>
>> > Mayuresh,
>> >
>> > Thanks for the comments.
>> > The requirement is that we need to pick up segments that are older than
>> > maxCompactionLagMs for compaction.
>> > maxCompactionLagMs is an upper-bound, which implies that picking up
>> > segments for compaction earlier doesn't violated the policy.
>> > We use the creation time of a segment as an estimation of its records
>> > arrival time, so these records can be compacted no later than
>> > maxCompactionLagMs.
>> >
>> > On the other hand, compaction is an expensive operation, we don't want
>> to
>> > compact the log partition whenever a new segment is sealed.
>> > Therefore, we want to pick up a segment for compaction when the segment
>> is
>> > closed to mandatory max compaction lag (so we use segment creation time
>> as
>> > an estimation.)
>> >
>> >
>> > Xiongqi (Wesley) Wu
>> >
>> >
>> > On Mon, Oct 15, 2018 at 5:54 PM Mayuresh Gharat <
>> > gharatmayures...@gmail.com>
>> > wrote:
>> >
>> > > Hi Wesley,
>> > >
>> > > Thanks for the KIP and sorry for being late to the party.
>> > >  I wanted to understand, the scenario you mentioned in Proposed
>> changes :
>> > >
>> > > -
>> > > >
>> > > > Estimate the earliest message timestamp of an un-compacted log
>> segment.
>> > > we
>> > > > only need to estimate earliest message timestamp for un-compacted
>> log
>> > > > segments to ensure timely compaction because the deletion requests
>> that
>> > > > belong to compacted segments have already been processed.
>> > > >
>> > > >    1.
>> > > >
>> > > >    for the first (earliest) log segment:  The estimated earliest
>> > > >    timestamp is set to the timestamp of the first message if
>> timestamp
>> > is
>> > > >    present in the message. Otherwise, the estimated earliest
>> timestamp
>> > > is set
>> > > >    to "segment.largestTimestamp - maxSegmentMs”
>> > > >     (segment.largestTimestamp is lastModified time of the log
>> segment
>> > or
>> > > max
>> > > >    timestamp we see for the log segment.). In the later case, the
>> > actual
>> > > >    timestamp of the first message might be later than the
>> estimation,
>> > > but it
>> > > >    is safe to pick up the log for compaction earlier.
>> > > >
>> > > > When we say "actual timestamp of the first message might be later
>> than
>> > > the
>> > > estimation, but it is safe to pick up the log for compaction
>> earlier.",
>> > > doesn't that violate the assumption that we will consider a segment
>> for
>> > > compaction only if the time of creation the segment has crossed the
>> "now
>> > -
>> > > maxCompactionLagMs" ?
>> > >
>> > > Thanks,
>> > >
>> > > Mayuresh
>> > >
>> > > On Mon, Sep 3, 2018 at 7:28 PM Brett Rann <br...@zendesk.com.invalid>
>> > > wrote:
>> > >
>> > > > Might also be worth moving to a vote thread? Discussion seems to
>> have
>> > > gone
>> > > > as far as it can.
>> > > >
>> > > > > On 4 Sep 2018, at 12:08, xiongqi wu <xiongq...@gmail.com> wrote:
>> > > > >
>> > > > > Brett,
>> > > > >
>> > > > > Yes, I will post PR tomorrow.
>> > > > >
>> > > > > Xiongqi (Wesley) Wu
>> > > > >
>> > > > >
>> > > > > On Sun, Sep 2, 2018 at 6:28 PM Brett Rann
>> <br...@zendesk.com.invalid
>> > >
>> > > > wrote:
>> > > > >
>> > > > > > +1 (non-binding) from me on the interface. I'd like to see
>> someone
>> > > > familiar
>> > > > > > with
>> > > > > > the code comment on the approach, and note there's a couple of
>> > > > different
>> > > > > > approaches: what's documented in the KIP, and what Xiaohe Dong
>> was
>> > > > working
>> > > > > > on
>> > > > > > here:
>> > > > > >
>> > > > > >
>> > > >
>> > >
>> >
>> https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0
>> > > > > >
>> > > > > > If you have code working already Xiongqi Wu could you share a
>> PR?
>> > I'd
>> > > > be
>> > > > > > happy
>> > > > > > to start testing.
>> > > > > >
>> > > > > > On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu <xiongq...@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi All,
>> > > > > > >
>> > > > > > > Do you have any additional comments on this KIP?
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu <
>> xiongq...@gmail.com
>> > >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > on 2)
>> > > > > > > > The offsetmap is built starting from dirty segment.
>> > > > > > > > The compaction starts from the beginning of the log
>> partition.
>> > > > That's
>> > > > > > how
>> > > > > > > > it ensure the deletion of tomb keys.
>> > > > > > > > I will double check tomorrow.
>> > > > > > > >
>> > > > > > > > Xiongqi (Wesley) Wu
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann
>> > > > <br...@zendesk.com.invalid>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> To just clarify a bit on 1. whether there's an external
>> > > storage/DB
>> > > > > > isn't
>> > > > > > > >> relevant here.
>> > > > > > > >> Compacted topics allow a tombstone record to be sent (a
>> null
>> > > value
>> > > > > > for a
>> > > > > > > >> key) which
>> > > > > > > >> currently will result in old values for that key being
>> deleted
>> > > if
>> > > > some
>> > > > > > > >> conditions are met.
>> > > > > > > >> There are existing controls to make sure the old values
>> will
>> > > stay
>> > > > > > around
>> > > > > > > >> for a minimum
>> > > > > > > >> time at least, but no dedicated control to ensure the
>> > tombstone
>> > > > will
>> > > > > > > >> delete
>> > > > > > > >> within a
>> > > > > > > >> maximum time.
>> > > > > > > >>
>> > > > > > > >> One popular reason that maximum time for deletion is
>> desirable
>> > > > right
>> > > > > > now
>> > > > > > > >> is
>> > > > > > > >> GDPR with
>> > > > > > > >> PII. But we're not proposing any GDPR awareness in kafka,
>> just
>> > > > being
>> > > > > > > able
>> > > > > > > >> to guarantee
>> > > > > > > >> a max time where a tombstoned key will be removed from the
>> > > > compacted
>> > > > > > > >> topic.
>> > > > > > > >>
>> > > > > > > >> on 2)
>> > > > > > > >> huh, i thought it kept track of the first dirty segment and
>> > > didn't
>> > > > > > > >> recompact older "clean" ones.
>> > > > > > > >> But I didn't look at code or test for that.
>> > > > > > > >>
>> > > > > > > >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu <
>> > > xiongq...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >>
>> > > > > > > >> > 1, Owner of data (in this sense, kafka is the not the
>> owner
>> > of
>> > > > data)
>> > > > > > > >> > should keep track of lifecycle of the data in some
>> external
>> > > > > > > storage/DB.
>> > > > > > > >> > The owner determines when to delete the data and send the
>> > > delete
>> > > > > > > >> request to
>> > > > > > > >> > kafka. Kafka doesn't know about the content of data but
>> to
>> > > > provide a
>> > > > > > > >> mean
>> > > > > > > >> > for deletion.
>> > > > > > > >> >
>> > > > > > > >> > 2 , each time compaction runs, it will start from first
>> > > > segments (no
>> > > > > > > >> > matter if it is compacted or not). The time estimation
>> here
>> > is
>> > > > only
>> > > > > > > used
>> > > > > > > >> > to determine whether we should run compaction on this log
>> > > > partition.
>> > > > > > > So
>> > > > > > > >> we
>> > > > > > > >> > only need to estimate uncompacted segments.
>> > > > > > > >> >
>> > > > > > > >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin <
>> > > lindon...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >> >
>> > > > > > > >> > > Hey Xiongqi,
>> > > > > > > >> > >
>> > > > > > > >> > > Thanks for the update. I have two questions for the
>> latest
>> > > > KIP.
>> > > > > > > >> > >
>> > > > > > > >> > > 1) The motivation section says that one use case is to
>> > > delete
>> > > > PII
>> > > > > > > >> > (Personal
>> > > > > > > >> > > Identifiable information) data within 7 days while
>> keeping
>> > > > non-PII
>> > > > > > > >> > > indefinitely in compacted format. I suppose the
>> use-case
>> > > > depends
>> > > > > > on
>> > > > > > > >> the
>> > > > > > > >> > > application to determine when to delete those PII data.
>> > > Could
>> > > > you
>> > > > > > > >> explain
>> > > > > > > >> > > how can application reliably determine the set of keys
>> > that
>> > > > should
>> > > > > > > be
>> > > > > > > >> > > deleted? Is application required to always messages
>> from
>> > the
>> > > > topic
>> > > > > > > >> after
>> > > > > > > >> > > every restart and determine the keys to be deleted by
>> > > looking
>> > > > at
>> > > > > > > >> message
>> > > > > > > >> > > timestamp, or is application supposed to persist the
>> key->
>> > > > > > timstamp
>> > > > > > > >> > > information in a separate persistent storage system?
>> > > > > > > >> > >
>> > > > > > > >> > > 2) It is mentioned in the KIP that "we only need to
>> > estimate
>> > > > > > > earliest
>> > > > > > > >> > > message timestamp for un-compacted log segments because
>> > the
>> > > > > > deletion
>> > > > > > > >> > > requests that belong to compacted segments have already
>> > been
>> > > > > > > >> processed".
>> > > > > > > >> > > Not sure if it is correct. If a segment is compacted
>> > before
>> > > > user
>> > > > > > > sends
>> > > > > > > >> > > message to delete a key in this segment, it seems that
>> we
>> > > > still
>> > > > > > need
>> > > > > > > >> to
>> > > > > > > >> > > ensure that the segment will be compacted again within
>> the
>> > > > given
>> > > > > > > time
>> > > > > > > >> > after
>> > > > > > > >> > > the deletion is requested, right?
>> > > > > > > >> > >
>> > > > > > > >> > > Thanks,
>> > > > > > > >> > > Dong
>> > > > > > > >> > >
>> > > > > > > >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu <
>> > > > xiongq...@gmail.com
>> > > > > > >
>> > > > > > > >> > wrote:
>> > > > > > > >> > >
>> > > > > > > >> > > > Hi Xiaohe,
>> > > > > > > >> > > >
>> > > > > > > >> > > > Quick note:
>> > > > > > > >> > > > 1) Use minimum of segment.ms and
>> max.compaction.lag.ms
>> > > > > > > >> > > > <http://max.compaction.ms
>> > > > > > > <http://max.compaction.ms>
>> > > > > > > >> > <http://max.compaction.ms
>> > > > > > > <http://max.compaction.ms>>>
>> > > > > > > >> > > >
>> > > > > > > >> > > > 2) I am not sure if I get your second question.
>> first,
>> > we
>> > > > have
>> > > > > > > >> jitter
>> > > > > > > >> > > when
>> > > > > > > >> > > > we roll the active segment. second, on each
>> compaction,
>> > we
>> > > > > > compact
>> > > > > > > >> upto
>> > > > > > > >> > > > the offsetmap could allow. Those will not lead to
>> > perfect
>> > > > > > > compaction
>> > > > > > > >> > > storm
>> > > > > > > >> > > > overtime. In addition, I expect we are setting
>> > > > > > > >> max.compaction.lag.ms
>> > > > > > > >> > on
>> > > > > > > >> > > > the order of days.
>> > > > > > > >> > > >
>> > > > > > > >> > > > 3) I don't have access to the confluent community
>> slack
>> > > for
>> > > > > > now. I
>> > > > > > > >> am
>> > > > > > > >> > > > reachable via the google handle out.
>> > > > > > > >> > > > To avoid the double effort, here is my plan:
>> > > > > > > >> > > > a) Collect more feedback and feature requriement on
>> the
>> > > KIP.
>> > > > > > > >> > > > b) Wait unitl this KIP is approved.
>> > > > > > > >> > > > c) I will address any additional requirements in the
>> > > > > > > implementation.
>> > > > > > > >> > (My
>> > > > > > > >> > > > current implementation only complies to whatever
>> > described
>> > > > in
>> > > > > > the
>> > > > > > > >> KIP
>> > > > > > > >> > > now)
>> > > > > > > >> > > > d) I can share the code with the you and community
>> see
>> > you
>> > > > want
>> > > > > > to
>> > > > > > > >> add
>> > > > > > > >> > > > anything.
>> > > > > > > >> > > > e) submission through committee
>> > > > > > > >> > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG <
>> > > > > > > >> dannyriv...@gmail.com>
>> > > > > > > >> > > > wrote:
>> > > > > > > >> > > >
>> > > > > > > >> > > > > Hi Xiongqi
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > Thanks for thinking about implementing this as
>> well.
>> > :)
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > I was thinking about using `segment.ms` to trigger
>> > the
>> > > > > > segment
>> > > > > > > >> roll.
>> > > > > > > >> > > > > Also, its value can be the largest time bias for
>> the
>> > > > record
>> > > > > > > >> deletion.
>> > > > > > > >> > > For
>> > > > > > > >> > > > > example, if the `segment.ms` is 1 day and `
>> > > > max.compaction.ms`
>> > > > > > > is
>> > > > > > > >> 30
>> > > > > > > >> > > > days,
>> > > > > > > >> > > > > the compaction may happen around 31 days.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > For my curiosity, is there a way we can do some
>> > > > performance
>> > > > > > test
>> > > > > > > >> for
>> > > > > > > >> > > this
>> > > > > > > >> > > > > and any tools you can recommend. As you know,
>> > > previously,
>> > > > it
>> > > > > > is
>> > > > > > > >> > cleaned
>> > > > > > > >> > > > up
>> > > > > > > >> > > > > by respecting dirty ratio, but now it may happen
>> > anytime
>> > > > if
>> > > > > > max
>> > > > > > > >> lag
>> > > > > > > >> > has
>> > > > > > > >> > > > > passed for each message. I wonder what would
>> happen if
>> > > > clients
>> > > > > > > >> send
>> > > > > > > >> > > huge
>> > > > > > > >> > > > > amount of tombstone records at the same time.
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > I am looking forward to have a quick chat with you
>> to
>> > > > avoid
>> > > > > > > double
>> > > > > > > >> > > effort
>> > > > > > > >> > > > > on this. I am in confluent community slack during
>> the
>> > > work
>> > > > > > time.
>> > > > > > > >> My
>> > > > > > > >> > > name
>> > > > > > > >> > > > is
>> > > > > > > >> > > > > Xiaohe Dong. :)
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > Rgds
>> > > > > > > >> > > > > Xiaohe Dong
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > On 2018/08/16 01:22:22, xiongqi wu <
>> > xiongq...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > >> > > > > > Brett,
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > Thank you for your comments.
>> > > > > > > >> > > > > > I was thinking since we already has immediate
>> > > compaction
>> > > > > > > >> setting by
>> > > > > > > >> > > > > setting
>> > > > > > > >> > > > > > min dirty ratio to 0, so I decide to use "0" as
>> > > disabled
>> > > > > > > state.
>> > > > > > > >> > > > > > I am ok to go with -1(disable), 0 (immediate)
>> > options.
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > For the implementation, there are a few
>> differences
>> > > > between
>> > > > > > > mine
>> > > > > > > >> > and
>> > > > > > > >> > > > > > "Xiaohe Dong"'s :
>> > > > > > > >> > > > > > 1) I used the estimated creation time of a log
>> > segment
>> > > > > > instead
>> > > > > > > >> of
>> > > > > > > >> > > > largest
>> > > > > > > >> > > > > > timestamp of a log to determine the compaction
>> > > > eligibility,
>> > > > > > > >> > because a
>> > > > > > > >> > > > log
>> > > > > > > >> > > > > > segment might stay as an active segment up to
>> "max
>> > > > > > compaction
>> > > > > > > >> lag".
>> > > > > > > >> > > > (see
>> > > > > > > >> > > > > > the KIP for detail).
>> > > > > > > >> > > > > > 2) I measure how much bytes that we must clean to
>> > > > follow the
>> > > > > > > >> "max
>> > > > > > > >> > > > > > compaction lag" rule, and use that to determine
>> the
>> > > > order of
>> > > > > > > >> > > > compaction.
>> > > > > > > >> > > > > > 3) force active segment to roll to follow the
>> "max
>> > > > > > compaction
>> > > > > > > >> lag"
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > I can share my code so we can coordinate.
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > I haven't think about a new API to force a
>> > compaction.
>> > > > what
>> > > > > > is
>> > > > > > > >> the
>> > > > > > > >> > > use
>> > > > > > > >> > > > > case
>> > > > > > > >> > > > > > for this one?
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > On Wed, Aug 15, 2018 at 5:33 PM, Brett Rann
>> > > > > > > >> > > <br...@zendesk.com.invalid
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > > wrote:
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > > We've been looking into this too.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Mailing list:
>> > > > > > > >> > > > > > > https://lists.apache.org/thread.html/
>> > > > > > > <https://lists.apache.org/thread.html/>
>> > > > > > > >> > <https://lists.apache.org/thread.html/
>> > > > > > > <https://lists.apache.org/thread.html/>>
>> > > > > > > >> > > ed7f6a6589f94e8c2a705553f364ef
>> > > > > > > >> > > > > > > 599cb6915e4c3ba9b561e610e4@%
>> > 3Cdev.kafka.apache.org
>> > > %3E
>> > > > > > > >> > > > > > > jira wish:
>> > > > > > https://issues.apache.org/jira/browse/KAFKA-7137
>> > > > > > > <https://issues.apache.org/jira/browse/KAFKA-7137>
>> > > > > > > >> > <https://issues.apache.org/jira/browse/KAFKA-7137
>> > > > > > > <https://issues.apache.org/jira/browse/KAFKA-7137>>
>> > > > > > > >> > > > > > > confluent slack discussion:
>> > > > > > > >> > > > > > >
>> > > > https://confluentcommunity.slack.com/archives/C49R61XMM/
>> > > > > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/>
>> > > > > > > >> > <
>> https://confluentcommunity.slack.com/archives/C49R61XMM/
>> > > > > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/>>
>> > > > > > > >> > > > > p1530760121000039
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > A person on my team has started on code so you
>> > might
>> > > > want
>> > > > > > to
>> > > > > > > >> > > > > coordinate:
>> > > > > > > >> > > > > > >
>> > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
>> > > > > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->
>> > > > > > > >> > <
>> https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-
>> > > > > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->>
>> > > > > > > >> > > > > > > cleaner-compaction-max-lifetime-2.0
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > He's been working with Jason Gustafson and
>> James
>> > > Chen
>> > > > > > around
>> > > > > > > >> the
>> > > > > > > >> > > > > changes.
>> > > > > > > >> > > > > > > You can ping him on confluent slack as Xiaohe
>> > Dong.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > It's great to know others are thinking on it as
>> > > well.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > You've added the requirement to force a segment
>> > roll
>> > > > which
>> > > > > > > we
>> > > > > > > >> > > hadn't
>> > > > > > > >> > > > > gotten
>> > > > > > > >> > > > > > > to yet, which is great. I was content with it
>> not
>> > > > > > including
>> > > > > > > >> the
>> > > > > > > >> > > > active
>> > > > > > > >> > > > > > > segment.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > > Adding topic level configuration "
>> > > > max.compaction.lag.ms
>> > > > > > ",
>> > > > > > > >> and
>> > > > > > > >> > > > > > > corresponding broker configuration "
>> > > > > > > >> > log.cleaner.max.compaction.la
>> > > > > > > >> > > > g.ms
>> > > > > > > >> > > > > ",
>> > > > > > > >> > > > > > > which is set to 0 (disabled) by default.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Glancing at some other settings convention
>> seems
>> > to
>> > > > me to
>> > > > > > be
>> > > > > > > >> -1
>> > > > > > > >> > for
>> > > > > > > >> > > > > > > disabled (or infinite, which is more meaningful
>> > > > here). 0
>> > > > > > to
>> > > > > > > me
>> > > > > > > >> > > > implies
>> > > > > > > >> > > > > > > instant, a little quicker than 1.
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > We've been trying to think about a way to
>> trigger
>> > > > > > compaction
>> > > > > > > >> as
>> > > > > > > >> > > well
>> > > > > > > >> > > > > > > through an API call, which would need to be
>> > flagged
>> > > > > > > somewhere
>> > > > > > > >> (ZK
>> > > > > > > >> > > > > admin/
>> > > > > > > >> > > > > > > space?) but we're struggling to think how that
>> > would
>> > > > be
>> > > > > > > >> > coordinated
>> > > > > > > >> > > > > across
>> > > > > > > >> > > > > > > brokers and partitions. Have you given any
>> thought
>> > > to
>> > > > > > that?
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > On Thu, Aug 16, 2018 at 8:44 AM xiongqi wu <
>> > > > > > > >> xiongq...@gmail.com>
>> > > > > > > >> > > > > wrote:
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > > Eno, Dong,
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > > I have updated the KIP. We decide not to
>> address
>> > > the
>> > > > > > issue
>> > > > > > > >> that
>> > > > > > > >> > > we
>> > > > > > > >> > > > > might
>> > > > > > > >> > > > > > > > have for both compaction and time retention
>> > > enabled
>> > > > > > topics
>> > > > > > > >> (see
>> > > > > > > >> > > the
>> > > > > > > >> > > > > > > > rejected alternative item 2). This KIP will
>> only
>> > > > ensure
>> > > > > > > log
>> > > > > > > >> can
>> > > > > > > >> > > be
>> > > > > > > >> > > > > > > > compacted after a specified time-interval.
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > > As suggested by Dong, we will also enforce "
>> > > > > > > >> > > max.compaction.lag.ms"
>> > > > > > > >> > > > > is
>> > > > > > > >> > > > > > > not
>> > > > > > > >> > > > > > > > less than "min.compaction.lag.ms".
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > >
>> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
>> > > > > > > >> > <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
>> > > > > > > >> > > > > Time-based
>> > > > > > > >> > > > > > > log
>> > > > > > > >> > > > > > > > compaction policy
>> > > > > > > >> > > > > > > > <
>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>
>> > > > > > > >> > <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>>
>> > > > > > > >> > > > > Time-based
>> > > > > > > >> > > > > > > log compaction policy>
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > > On Tue, Aug 14, 2018 at 5:01 PM, xiongqi wu <
>> > > > > > > >> > xiongq...@gmail.com
>> > > > > > > >> > > >
>> > > > > > > >> > > > > wrote:
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > Per discussion with Dong, he made a very
>> good
>> > > > point
>> > > > > > that
>> > > > > > > >> if
>> > > > > > > >> > > > > compaction
>> > > > > > > >> > > > > > > > > and time based retention are both enabled
>> on a
>> > > > topic,
>> > > > > > > the
>> > > > > > > >> > > > > compaction
>> > > > > > > >> > > > > > > > might
>> > > > > > > >> > > > > > > > > prevent records from being deleted on time.
>> > The
>> > > > reason
>> > > > > > > is
>> > > > > > > >> > when
>> > > > > > > >> > > > > > > compacting
>> > > > > > > >> > > > > > > > > multiple segments into one single segment,
>> the
>> > > > newly
>> > > > > > > >> created
>> > > > > > > >> > > > > segment
>> > > > > > > >> > > > > > > will
>> > > > > > > >> > > > > > > > > have same lastmodified timestamp as latest
>> > > > original
>> > > > > > > >> segment.
>> > > > > > > >> > We
>> > > > > > > >> > > > > lose
>> > > > > > > >> > > > > > > the
>> > > > > > > >> > > > > > > > > timestamp of all original segments except
>> the
>> > > last
>> > > > > > one.
>> > > > > > > >> As a
>> > > > > > > >> > > > > result,
>> > > > > > > >> > > > > > > > > records might not be deleted as it should
>> be
>> > > > through
>> > > > > > > time
>> > > > > > > >> > based
>> > > > > > > >> > > > > > > > retention.
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > With the current KIP proposal, if we want
>> to
>> > > > ensure
>> > > > > > > timely
>> > > > > > > >> > > > > deletion, we
>> > > > > > > >> > > > > > > > > have the following configurations:
>> > > > > > > >> > > > > > > > > 1) enable time based log compaction only :
>> > > > deletion is
>> > > > > > > >> done
>> > > > > > > >> > > > though
>> > > > > > > >> > > > > > > > > overriding the same key
>> > > > > > > >> > > > > > > > > 2) enable time based log retention only:
>> > > deletion
>> > > > is
>> > > > > > > done
>> > > > > > > >> > > though
>> > > > > > > >> > > > > > > > > time-based retention
>> > > > > > > >> > > > > > > > > 3) enable both log compaction and time
>> based
>> > > > > > retention:
>> > > > > > > >> > > Deletion
>> > > > > > > >> > > > > is not
>> > > > > > > >> > > > > > > > > guaranteed.
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > Not sure if we have use case 3 and also
>> want
>> > > > deletion
>> > > > > > to
>> > > > > > > >> > happen
>> > > > > > > >> > > > on
>> > > > > > > >> > > > > > > time.
>> > > > > > > >> > > > > > > > > There are several options to address
>> deletion
>> > > > issue
>> > > > > > when
>> > > > > > > >> > enable
>> > > > > > > >> > > > > both
>> > > > > > > >> > > > > > > > > compaction and retention:
>> > > > > > > >> > > > > > > > > A) During log compaction, looking into
>> record
>> > > > > > timestamp
>> > > > > > > to
>> > > > > > > >> > > delete
>> > > > > > > >> > > > > > > expired
>> > > > > > > >> > > > > > > > > records. This can be done in compaction
>> logic
>> > > > itself
>> > > > > > or
>> > > > > > > >> use
>> > > > > > > >> > > > > > > > > AdminClient.deleteRecords() . But this
>> assumes
>> > > we
>> > > > have
>> > > > > > > >> record
>> > > > > > > >> > > > > > > timestamp.
>> > > > > > > >> > > > > > > > > B) retain the lastModifed time of original
>> > > > segments
>> > > > > > > during
>> > > > > > > >> > log
>> > > > > > > >> > > > > > > > compaction.
>> > > > > > > >> > > > > > > > > This requires extra meta data to record the
>> > > > > > information
>> > > > > > > or
>> > > > > > > >> > not
>> > > > > > > >> > > > > grouping
>> > > > > > > >> > > > > > > > > multiple segments into one during
>> compaction.
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > If we have use case 3 in general, I would
>> > prefer
>> > > > > > > solution
>> > > > > > > >> A
>> > > > > > > >> > and
>> > > > > > > >> > > > > rely on
>> > > > > > > >> > > > > > > > > record timestamp.
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > Two questions:
>> > > > > > > >> > > > > > > > > Do we have use case 3? Is it nice to have
>> or
>> > > must
>> > > > > > have?
>> > > > > > > >> > > > > > > > > If we have use case 3 and want to go with
>> > > > solution A,
>> > > > > > > >> should
>> > > > > > > >> > we
>> > > > > > > >> > > > > > > introduce
>> > > > > > > >> > > > > > > > > a new configuration to enforce deletion by
>> > > > timestamp?
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > On Tue, Aug 14, 2018 at 1:52 PM, xiongqi
>> wu <
>> > > > > > > >> > > xiongq...@gmail.com
>> > > > > > > >> > > > >
>> > > > > > > >> > > > > > > wrote:
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > >> Dong,
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Thanks for the comment.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> There are two retention policy: log
>> > compaction
>> > > > and
>> > > > > > time
>> > > > > > > >> > based
>> > > > > > > >> > > > > > > retention.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Log compaction:
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> we have use cases to keep infinite
>> retention
>> > > of a
>> > > > > > topic
>> > > > > > > >> > (only
>> > > > > > > >> > > > > > > > >> compaction). GDPR cares about deletion of
>> PII
>> > > > > > (personal
>> > > > > > > >> > > > > identifiable
>> > > > > > > >> > > > > > > > >> information) data.
>> > > > > > > >> > > > > > > > >> Since Kafka doesn't know what records
>> contain
>> > > > PII, it
>> > > > > > > >> relies
>> > > > > > > >> > > on
>> > > > > > > >> > > > > upper
>> > > > > > > >> > > > > > > > >> layer to delete those records.
>> > > > > > > >> > > > > > > > >> For those infinite retention uses uses,
>> kafka
>> > > > needs
>> > > > > > to
>> > > > > > > >> > > provide a
>> > > > > > > >> > > > > way
>> > > > > > > >> > > > > > > to
>> > > > > > > >> > > > > > > > >> enforce compaction on time. This is what
>> we
>> > try
>> > > > to
>> > > > > > > >> address
>> > > > > > > >> > in
>> > > > > > > >> > > > this
>> > > > > > > >> > > > > > > KIP.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Time based retention,
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> There are also use cases that users of
>> Kafka
>> > > > might
>> > > > > > want
>> > > > > > > >> to
>> > > > > > > >> > > > expire
>> > > > > > > >> > > > > all
>> > > > > > > >> > > > > > > > >> their data.
>> > > > > > > >> > > > > > > > >> In those cases, they can use time based
>> > > > retention of
>> > > > > > > >> their
>> > > > > > > >> > > > topics.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Regarding your first question, if a user
>> > wants
>> > > to
>> > > > > > > delete
>> > > > > > > >> a
>> > > > > > > >> > key
>> > > > > > > >> > > > in
>> > > > > > > >> > > > > the
>> > > > > > > >> > > > > > > > >> log compaction topic, the user has to
>> send a
>> > > > deletion
>> > > > > > > >> using
>> > > > > > > >> > > the
>> > > > > > > >> > > > > same
>> > > > > > > >> > > > > > > > key.
>> > > > > > > >> > > > > > > > >> Kafka only makes sure the deletion will
>> > happen
>> > > > under
>> > > > > > a
>> > > > > > > >> > certain
>> > > > > > > >> > > > > time
>> > > > > > > >> > > > > > > > >> periods (like 2 days/7 days).
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Regarding your second question. In most
>> > cases,
>> > > we
>> > > > > > might
>> > > > > > > >> want
>> > > > > > > >> > > to
>> > > > > > > >> > > > > delete
>> > > > > > > >> > > > > > > > >> all duplicated keys at the same time.
>> > > > > > > >> > > > > > > > >> Compaction might be more efficient since
>> we
>> > > need
>> > > > to
>> > > > > > > scan
>> > > > > > > >> the
>> > > > > > > >> > > log
>> > > > > > > >> > > > > and
>> > > > > > > >> > > > > > > > find
>> > > > > > > >> > > > > > > > >> all duplicates. However, the expected use
>> > case
>> > > > is to
>> > > > > > > set
>> > > > > > > >> the
>> > > > > > > >> > > > time
>> > > > > > > >> > > > > > > based
>> > > > > > > >> > > > > > > > >> compaction interval on the order of days,
>> and
>> > > be
>> > > > > > larger
>> > > > > > > >> than
>> > > > > > > >> > > > 'min
>> > > > > > > >> > > > > > > > >> compaction lag". We don't want log
>> compaction
>> > > to
>> > > > > > happen
>> > > > > > > >> > > > frequently
>> > > > > > > >> > > > > > > since
>> > > > > > > >> > > > > > > > >> it is expensive. The purpose is to help
>> low
>> > > > > > production
>> > > > > > > >> rate
>> > > > > > > >> > > > topic
>> > > > > > > >> > > > > to
>> > > > > > > >> > > > > > > get
>> > > > > > > >> > > > > > > > >> compacted on time. For the topic with
>> > "normal"
>> > > > > > incoming
>> > > > > > > >> > > message
>> > > > > > > >> > > > > > > message
>> > > > > > > >> > > > > > > > >> rate, the "min dirty ratio" might have
>> > > triggered
>> > > > the
>> > > > > > > >> > > compaction
>> > > > > > > >> > > > > before
>> > > > > > > >> > > > > > > > this
>> > > > > > > >> > > > > > > > >> time based compaction policy takes effect.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> Eno,
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> For your question, like I mentioned we
>> have
>> > > long
>> > > > time
>> > > > > > > >> > > retention
>> > > > > > > >> > > > > use
>> > > > > > > >> > > > > > > case
>> > > > > > > >> > > > > > > > >> for log compacted topic, but we want to
>> > provide
>> > > > > > ability
>> > > > > > > >> to
>> > > > > > > >> > > > delete
>> > > > > > > >> > > > > > > > certain
>> > > > > > > >> > > > > > > > >> PII records on time.
>> > > > > > > >> > > > > > > > >> Kafka itself doesn't know whether a record
>> > > > contains
>> > > > > > > >> > sensitive
>> > > > > > > >> > > > > > > > information
>> > > > > > > >> > > > > > > > >> and relies on the user for deletion.
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin
>> <
>> > > > > > > >> > > lindon...@gmail.com>
>> > > > > > > >> > > > > > > wrote:
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>> Hey Xiongqi,
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> Thanks for the KIP. I have two questions
>> > > > regarding
>> > > > > > the
>> > > > > > > >> > > use-case
>> > > > > > > >> > > > > for
>> > > > > > > >> > > > > > > > >>> meeting
>> > > > > > > >> > > > > > > > >>> GDPR requirement.
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> 1) If I recall correctly, one of the GDPR
>> > > > > > requirement
>> > > > > > > is
>> > > > > > > >> > that
>> > > > > > > >> > > > we
>> > > > > > > >> > > > > can
>> > > > > > > >> > > > > > > > not
>> > > > > > > >> > > > > > > > >>> keep messages longer than e.g. 30 days in
>> > > > storage
>> > > > > > > (e.g.
>> > > > > > > >> > > Kafka).
>> > > > > > > >> > > > > Say
>> > > > > > > >> > > > > > > > there
>> > > > > > > >> > > > > > > > >>> exists a partition p0 which contains
>> > message1
>> > > > with
>> > > > > > > key1
>> > > > > > > >> and
>> > > > > > > >> > > > > message2
>> > > > > > > >> > > > > > > > with
>> > > > > > > >> > > > > > > > >>> key2. And then user keeps producing
>> messages
>> > > > with
>> > > > > > > >> key=key2
>> > > > > > > >> > to
>> > > > > > > >> > > > > this
>> > > > > > > >> > > > > > > > >>> partition. Since message1 with key1 is
>> never
>> > > > > > > overridden,
>> > > > > > > >> > > sooner
>> > > > > > > >> > > > > or
>> > > > > > > >> > > > > > > > later
>> > > > > > > >> > > > > > > > >>> we
>> > > > > > > >> > > > > > > > >>> will want to delete message1 and keep the
>> > > latest
>> > > > > > > message
>> > > > > > > >> > with
>> > > > > > > >> > > > > > > key=key2.
>> > > > > > > >> > > > > > > > >>> But
>> > > > > > > >> > > > > > > > >>> currently it looks like log compact
>> logic in
>> > > > Kafka
>> > > > > > > will
>> > > > > > > >> > > always
>> > > > > > > >> > > > > put
>> > > > > > > >> > > > > > > > these
>> > > > > > > >> > > > > > > > >>> messages in the same segment. Will this
>> be
>> > an
>> > > > issue?
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> 2) The current KIP intends to provide the
>> > > > capability
>> > > > > > > to
>> > > > > > > >> > > delete
>> > > > > > > >> > > > a
>> > > > > > > >> > > > > > > given
>> > > > > > > >> > > > > > > > >>> message in log compacted topic. Does such
>> > > > use-case
>> > > > > > > also
>> > > > > > > >> > > require
>> > > > > > > >> > > > > Kafka
>> > > > > > > >> > > > > > > > to
>> > > > > > > >> > > > > > > > >>> keep the messages produced before the
>> given
>> > > > message?
>> > > > > > > If
>> > > > > > > >> > yes,
>> > > > > > > >> > > > > then we
>> > > > > > > >> > > > > > > > can
>> > > > > > > >> > > > > > > > >>> probably just use
>> > AdminClient.deleteRecords()
>> > > or
>> > > > > > > >> time-based
>> > > > > > > >> > > log
>> > > > > > > >> > > > > > > > retention
>> > > > > > > >> > > > > > > > >>> to meet the use-case requirement. If no,
>> do
>> > > you
>> > > > know
>> > > > > > > >> what
>> > > > > > > >> > is
>> > > > > > > >> > > > the
>> > > > > > > >> > > > > > > GDPR's
>> > > > > > > >> > > > > > > > >>> requirement on time-to-deletion after
>> user
>> > > > > > explicitly
>> > > > > > > >> > > requests
>> > > > > > > >> > > > > the
>> > > > > > > >> > > > > > > > >>> deletion
>> > > > > > > >> > > > > > > > >>> (e.g. 1 hour, 1 day, 7 day)?
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> Thanks,
>> > > > > > > >> > > > > > > > >>> Dong
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> On Mon, Aug 13, 2018 at 3:44 PM, xiongqi
>> wu
>> > <
>> > > > > > > >> > > > xiongq...@gmail.com
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > > > wrote:
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>> > Hi Eno,
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> > The GDPR request we are getting here at
>> > > > linkedin
>> > > > > > is
>> > > > > > > >> if we
>> > > > > > > >> > > > get a
>> > > > > > > >> > > > > > > > >>> request to
>> > > > > > > >> > > > > > > > >>> > delete a record through a null key on a
>> > log
>> > > > > > > compacted
>> > > > > > > >> > > topic,
>> > > > > > > >> > > > > > > > >>> > we want to delete the record via
>> > compaction
>> > > > in a
>> > > > > > > given
>> > > > > > > >> > time
>> > > > > > > >> > > > > period
>> > > > > > > >> > > > > > > > >>> like 2
>> > > > > > > >> > > > > > > > >>> > days (whatever is required by the
>> policy).
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> > There might be other issues (such as
>> > orphan
>> > > > log
>> > > > > > > >> segments
>> > > > > > > >> > > > under
>> > > > > > > >> > > > > > > > certain
>> > > > > > > >> > > > > > > > >>> > conditions) that lead to GDPR problem
>> but
>> > > > they are
>> > > > > > > >> more
>> > > > > > > >> > > like
>> > > > > > > >> > > > > > > > >>> something we
>> > > > > > > >> > > > > > > > >>> > need to fix anyway regardless of GDPR.
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> > -- Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> > On Mon, Aug 13, 2018 at 2:56 PM, Eno
>> > > Thereska
>> > > > <
>> > > > > > > >> > > > > > > > eno.there...@gmail.com>
>> > > > > > > >> > > > > > > > >>> > wrote:
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>> > > Hello,
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > Thanks for the KIP. I'd like to see a
>> > more
>> > > > > > precise
>> > > > > > > >> > > > > definition of
>> > > > > > > >> > > > > > > > what
>> > > > > > > >> > > > > > > > >>> > part
>> > > > > > > >> > > > > > > > >>> > > of GDPR you are targeting as well as
>> > some
>> > > > sort
>> > > > > > of
>> > > > > > > >> > > > > verification
>> > > > > > > >> > > > > > > that
>> > > > > > > >> > > > > > > > >>> this
>> > > > > > > >> > > > > > > > >>> > > KIP actually addresses the problem.
>> > Right
>> > > > now I
>> > > > > > > find
>> > > > > > > >> > > this a
>> > > > > > > >> > > > > bit
>> > > > > > > >> > > > > > > > >>> vague:
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > "Ability to delete a log message
>> through
>> > > > > > > compaction
>> > > > > > > >> in
>> > > > > > > >> > a
>> > > > > > > >> > > > > timely
>> > > > > > > >> > > > > > > > >>> manner
>> > > > > > > >> > > > > > > > >>> > has
>> > > > > > > >> > > > > > > > >>> > > become an important requirement in
>> some
>> > > use
>> > > > > > cases
>> > > > > > > >> > (e.g.,
>> > > > > > > >> > > > > GDPR)"
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > Is there any guarantee that after
>> this
>> > KIP
>> > > > the
>> > > > > > > GDPR
>> > > > > > > >> > > problem
>> > > > > > > >> > > > > is
>> > > > > > > >> > > > > > > > >>> solved or
>> > > > > > > >> > > > > > > > >>> > do
>> > > > > > > >> > > > > > > > >>> > > we need to do something else as well,
>> > > e.g.,
>> > > > more
>> > > > > > > >> KIPs?
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > Thanks
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > Eno
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > On Thu, Aug 9, 2018 at 4:18 PM,
>> xiongqi
>> > > wu <
>> > > > > > > >> > > > > xiongq...@gmail.com>
>> > > > > > > >> > > > > > > > >>> wrote:
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> > > > Hi Kafka,
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > > > This KIP tries to address GDPR
>> concern
>> > > to
>> > > > > > > fulfill
>> > > > > > > >> > > > deletion
>> > > > > > > >> > > > > > > > request
>> > > > > > > >> > > > > > > > >>> on
>> > > > > > > >> > > > > > > > >>> > > time
>> > > > > > > >> > > > > > > > >>> > > > through time-based log compaction
>> on a
>> > > > > > > compaction
>> > > > > > > >> > > enabled
>> > > > > > > >> > > > > > > topic:
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
>> > > > > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>
>> > > > > > > >> > > > > > > > <
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
>> > > > > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>>
>> > > > > > > >> > > > > > > > >>> > > >
>> > 354%3A+Time-based+log+compaction+policy
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > > > Any feedback will be appreciated.
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > > > Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > > > > >>> > > >
>> > > > > > > >> > > > > > > > >>> > >
>> > > > > > > >> > > > > > > > >>> >
>> > > > > > > >> > > > > > > > >>>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >> --
>> > > > > > > >> > > > > > > > >> Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > > > > >>
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > > > --
>> > > > > > > >> > > > > > > > > Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > > > > >
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > > > --
>> > > > > > > >> > > > > > > > Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > --
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Brett Rann
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Senior DevOps Engineer
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Zendesk International Ltd
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > 395 Collins Street, Melbourne VIC 3000
>> Australia
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > > > Mobile: +61 (0) 418 826 017
>> > > > > > > >> > > > > > >
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > > > --
>> > > > > > > >> > > > > > Xiongqi (Wesley) Wu
>> > > > > > > >> > > > > >
>> > > > > > > >> > > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > >
>> > > > > > > >> > > > --
>> > > > > > > >> > > > Xiongqi (Wesley) Wu
>> > > > > > > >> > > >
>> > > > > > > >> > >
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> >
>> > > > > > > >> > --
>> > > > > > > >> > Xiongqi (Wesley) Wu
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> --
>> > > > > > > >>
>> > > > > > > >> Brett Rann
>> > > > > > > >>
>> > > > > > > >> Senior DevOps Engineer
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >> Zendesk International Ltd
>> > > > > > > >>
>> > > > > > > >> 395 Collins Street, Melbourne VIC 3000 Australia
>> > > > > > > >>
>> > > > > > > >> Mobile: +61 (0) 418 826 017
>> > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Xiongqi (Wesley) Wu
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Brett Rann
>> > > > > >
>> > > > > > Senior DevOps Engineer
>> > > > > >
>> > > > > >
>> > > > > > Zendesk International Ltd
>> > > > > >
>> > > > > > 395 Collins Street, Melbourne VIC 3000 Australia
>> > > > > >
>> > > > > > Mobile: +61 (0) 418 826 017
>> > > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > -Regards,
>> > > Mayuresh R. Gharat
>> > > (862) 250-7125
>> > >
>> >
>>
>

Reply via email to