Hi Dong, I have updated the KIP to address your comments. One correction to previous Email: after offline discussion with Dong, we decide to use MAX_LONG as default value for max.compaction.lag.ms.
Xiongqi (Wesley) Wu On Mon, Oct 29, 2018 at 12:15 PM xiongqi wu <xiongq...@gmail.com> wrote: > Hi Dong, > > Thank you for your comment. See my inline comments. > I will update the KIP shortly. > > Xiongqi (Wesley) Wu > > > On Sun, Oct 28, 2018 at 9:17 PM Dong Lin <lindon...@gmail.com> wrote: > >> Hey Xiongqi, >> >> Sorry for late reply. I have some comments below: >> >> 1) As discussed earlier in the email list, if the topic is configured with >> both deletion and compaction, in some cases messages produced a long time >> ago can not be deleted based on time. This is a valid use-case because we >> actually have topic which is configured with both deletion and compaction >> policy. And we should enforce the semantics for both policy. Solution A >> sounds good. We do not need interface change (e.g. extra config) to >> enforce >> solution A. All we need is to update implementation so that when broker >> compacts a topic, if the message has timestamp (which is the common case), >> messages that are too old (based on the time-based retention config) will >> be discarded. Since this is a valid issue and it is also related to the >> guarantee of when a message can be deleted, can we include the solution of >> this problem in the KIP? >> > ====== This makes sense. We can use similar approach to increase the log > start offset. > >> >> 2) It is probably OK to assume that all messages have timestamp. The >> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and >> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer >> supported. Also, since the use-case for this feature is primarily for >> GDPR, >> we can assume that client library has already been upgraded to support >> SSL, >> which feature is added after KIP-31 and KIP-32. >> >> =========> Ok. We can use message timestamp to delete expired records > if both compaction and retention are enabled. > > > 3) In Proposed Change section 2.a, it is said that segment.largestTimestamp >> - maxSegmentMs can be used to determine the timestamp of the earliest >> message. Would it be simpler to just use the create time of the file to >> determine the time? >> >> ========> Linux/Java doesn't provide API for file creation time because > some filesystem type doesn't provide file creation time. > > >> 4) The KIP suggests to use must-clean-ratio to select the partition to be >> compacted. Unlike dirty ratio which is mostly for performance, the logs >> whose "must-clean-ratio" is non-zero must be compacted immediately for >> correctness reason (and for GDPR). And if this can no be achieved because >> e.g. broker compaction throughput is too low, investigation will be >> needed. >> So it seems simpler to first compact logs which has segment whose earliest >> timetamp is earlier than now - max.compaction.lag.ms, instead of defining >> must-clean-ratio and sorting logs based on this value. >> >> > ======> Good suggestion. This can simply the implementation quite a bit > if we are not too concerned about compaction of GDPR required partition > queued behind some large partition. The actual compaction completion time > is not guaranteed anyway. > > >> 5) The KIP says max.compaction.lag.ms is 0 by default and it is also >> suggested that 0 means disable. Should we set this value to MAX_LONG by >> default to effectively disable the feature added in this KIP? >> >> ====> I would rather use 0 so the corresponding code path will not be > exercised. By using MAX_LONG, we would theoretically go through related > code to find out whether the partition is required to be compacted to > satisfy MAX_LONG. > > 6) It is probably cleaner and readable not to include in Public Interface >> section those configs whose meaning is not changed. >> >> ====> I will clean that up. > > 7) The goal of this KIP is to ensure that log segment whose earliest >> message is earlier than a given threshold will be compacted. This goal may >> not be achieved if the compact throughput can not catchup with the total >> bytes-in-rate for the compacted topics on the broker. Thus we need an easy >> way to tell operator whether this goal is achieved. If we don't already >> have such metric, maybe we can include metrics to show 1) the total number >> of log segments (or logs) which needs to be immediately compacted as >> determined by max.compaction.lag; and 2) the maximum value of now - >> earliest_time_stamp_of_segment among all segments that needs to be >> compacted. >> >> =======> good suggestion. I will update KIP for these metrics. > > 8) The Performance Impact suggests user to use the existing metrics to >> monitor the performance impact of this KIP. It i useful to list mean of >> each jmx metrics that we want user to monitor, and possibly explain how to >> interpret the value of these metrics to determine whether there is >> performance issue. >> >> =========> I will update the KIP. > >> Thanks, >> Dong >> >> On Tue, Oct 16, 2018 at 10:53 AM xiongqi wu <xiongq...@gmail.com> wrote: >> >> > Mayuresh, >> > >> > Thanks for the comments. >> > The requirement is that we need to pick up segments that are older than >> > maxCompactionLagMs for compaction. >> > maxCompactionLagMs is an upper-bound, which implies that picking up >> > segments for compaction earlier doesn't violated the policy. >> > We use the creation time of a segment as an estimation of its records >> > arrival time, so these records can be compacted no later than >> > maxCompactionLagMs. >> > >> > On the other hand, compaction is an expensive operation, we don't want >> to >> > compact the log partition whenever a new segment is sealed. >> > Therefore, we want to pick up a segment for compaction when the segment >> is >> > closed to mandatory max compaction lag (so we use segment creation time >> as >> > an estimation.) >> > >> > >> > Xiongqi (Wesley) Wu >> > >> > >> > On Mon, Oct 15, 2018 at 5:54 PM Mayuresh Gharat < >> > gharatmayures...@gmail.com> >> > wrote: >> > >> > > Hi Wesley, >> > > >> > > Thanks for the KIP and sorry for being late to the party. >> > > I wanted to understand, the scenario you mentioned in Proposed >> changes : >> > > >> > > - >> > > > >> > > > Estimate the earliest message timestamp of an un-compacted log >> segment. >> > > we >> > > > only need to estimate earliest message timestamp for un-compacted >> log >> > > > segments to ensure timely compaction because the deletion requests >> that >> > > > belong to compacted segments have already been processed. >> > > > >> > > > 1. >> > > > >> > > > for the first (earliest) log segment: The estimated earliest >> > > > timestamp is set to the timestamp of the first message if >> timestamp >> > is >> > > > present in the message. Otherwise, the estimated earliest >> timestamp >> > > is set >> > > > to "segment.largestTimestamp - maxSegmentMs” >> > > > (segment.largestTimestamp is lastModified time of the log >> segment >> > or >> > > max >> > > > timestamp we see for the log segment.). In the later case, the >> > actual >> > > > timestamp of the first message might be later than the >> estimation, >> > > but it >> > > > is safe to pick up the log for compaction earlier. >> > > > >> > > > When we say "actual timestamp of the first message might be later >> than >> > > the >> > > estimation, but it is safe to pick up the log for compaction >> earlier.", >> > > doesn't that violate the assumption that we will consider a segment >> for >> > > compaction only if the time of creation the segment has crossed the >> "now >> > - >> > > maxCompactionLagMs" ? >> > > >> > > Thanks, >> > > >> > > Mayuresh >> > > >> > > On Mon, Sep 3, 2018 at 7:28 PM Brett Rann <br...@zendesk.com.invalid> >> > > wrote: >> > > >> > > > Might also be worth moving to a vote thread? Discussion seems to >> have >> > > gone >> > > > as far as it can. >> > > > >> > > > > On 4 Sep 2018, at 12:08, xiongqi wu <xiongq...@gmail.com> wrote: >> > > > > >> > > > > Brett, >> > > > > >> > > > > Yes, I will post PR tomorrow. >> > > > > >> > > > > Xiongqi (Wesley) Wu >> > > > > >> > > > > >> > > > > On Sun, Sep 2, 2018 at 6:28 PM Brett Rann >> <br...@zendesk.com.invalid >> > > >> > > > wrote: >> > > > > >> > > > > > +1 (non-binding) from me on the interface. I'd like to see >> someone >> > > > familiar >> > > > > > with >> > > > > > the code comment on the approach, and note there's a couple of >> > > > different >> > > > > > approaches: what's documented in the KIP, and what Xiaohe Dong >> was >> > > > working >> > > > > > on >> > > > > > here: >> > > > > > >> > > > > > >> > > > >> > > >> > >> https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-cleaner-compaction-max-lifetime-2.0 >> > > > > > >> > > > > > If you have code working already Xiongqi Wu could you share a >> PR? >> > I'd >> > > > be >> > > > > > happy >> > > > > > to start testing. >> > > > > > >> > > > > > On Tue, Aug 28, 2018 at 5:57 AM xiongqi wu <xiongq...@gmail.com >> > >> > > > wrote: >> > > > > > >> > > > > > > Hi All, >> > > > > > > >> > > > > > > Do you have any additional comments on this KIP? >> > > > > > > >> > > > > > > >> > > > > > > On Thu, Aug 16, 2018 at 9:17 PM, xiongqi wu < >> xiongq...@gmail.com >> > > >> > > > wrote: >> > > > > > > >> > > > > > > > on 2) >> > > > > > > > The offsetmap is built starting from dirty segment. >> > > > > > > > The compaction starts from the beginning of the log >> partition. >> > > > That's >> > > > > > how >> > > > > > > > it ensure the deletion of tomb keys. >> > > > > > > > I will double check tomorrow. >> > > > > > > > >> > > > > > > > Xiongqi (Wesley) Wu >> > > > > > > > >> > > > > > > > >> > > > > > > > On Thu, Aug 16, 2018 at 6:46 PM Brett Rann >> > > > <br...@zendesk.com.invalid> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > >> To just clarify a bit on 1. whether there's an external >> > > storage/DB >> > > > > > isn't >> > > > > > > >> relevant here. >> > > > > > > >> Compacted topics allow a tombstone record to be sent (a >> null >> > > value >> > > > > > for a >> > > > > > > >> key) which >> > > > > > > >> currently will result in old values for that key being >> deleted >> > > if >> > > > some >> > > > > > > >> conditions are met. >> > > > > > > >> There are existing controls to make sure the old values >> will >> > > stay >> > > > > > around >> > > > > > > >> for a minimum >> > > > > > > >> time at least, but no dedicated control to ensure the >> > tombstone >> > > > will >> > > > > > > >> delete >> > > > > > > >> within a >> > > > > > > >> maximum time. >> > > > > > > >> >> > > > > > > >> One popular reason that maximum time for deletion is >> desirable >> > > > right >> > > > > > now >> > > > > > > >> is >> > > > > > > >> GDPR with >> > > > > > > >> PII. But we're not proposing any GDPR awareness in kafka, >> just >> > > > being >> > > > > > > able >> > > > > > > >> to guarantee >> > > > > > > >> a max time where a tombstoned key will be removed from the >> > > > compacted >> > > > > > > >> topic. >> > > > > > > >> >> > > > > > > >> on 2) >> > > > > > > >> huh, i thought it kept track of the first dirty segment and >> > > didn't >> > > > > > > >> recompact older "clean" ones. >> > > > > > > >> But I didn't look at code or test for that. >> > > > > > > >> >> > > > > > > >> On Fri, Aug 17, 2018 at 10:57 AM xiongqi wu < >> > > xiongq...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> >> > > > > > > >> > 1, Owner of data (in this sense, kafka is the not the >> owner >> > of >> > > > data) >> > > > > > > >> > should keep track of lifecycle of the data in some >> external >> > > > > > > storage/DB. >> > > > > > > >> > The owner determines when to delete the data and send the >> > > delete >> > > > > > > >> request to >> > > > > > > >> > kafka. Kafka doesn't know about the content of data but >> to >> > > > provide a >> > > > > > > >> mean >> > > > > > > >> > for deletion. >> > > > > > > >> > >> > > > > > > >> > 2 , each time compaction runs, it will start from first >> > > > segments (no >> > > > > > > >> > matter if it is compacted or not). The time estimation >> here >> > is >> > > > only >> > > > > > > used >> > > > > > > >> > to determine whether we should run compaction on this log >> > > > partition. >> > > > > > > So >> > > > > > > >> we >> > > > > > > >> > only need to estimate uncompacted segments. >> > > > > > > >> > >> > > > > > > >> > On Thu, Aug 16, 2018 at 5:35 PM, Dong Lin < >> > > lindon...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > >> > > > > > > >> > > Hey Xiongqi, >> > > > > > > >> > > >> > > > > > > >> > > Thanks for the update. I have two questions for the >> latest >> > > > KIP. >> > > > > > > >> > > >> > > > > > > >> > > 1) The motivation section says that one use case is to >> > > delete >> > > > PII >> > > > > > > >> > (Personal >> > > > > > > >> > > Identifiable information) data within 7 days while >> keeping >> > > > non-PII >> > > > > > > >> > > indefinitely in compacted format. I suppose the >> use-case >> > > > depends >> > > > > > on >> > > > > > > >> the >> > > > > > > >> > > application to determine when to delete those PII data. >> > > Could >> > > > you >> > > > > > > >> explain >> > > > > > > >> > > how can application reliably determine the set of keys >> > that >> > > > should >> > > > > > > be >> > > > > > > >> > > deleted? Is application required to always messages >> from >> > the >> > > > topic >> > > > > > > >> after >> > > > > > > >> > > every restart and determine the keys to be deleted by >> > > looking >> > > > at >> > > > > > > >> message >> > > > > > > >> > > timestamp, or is application supposed to persist the >> key-> >> > > > > > timstamp >> > > > > > > >> > > information in a separate persistent storage system? >> > > > > > > >> > > >> > > > > > > >> > > 2) It is mentioned in the KIP that "we only need to >> > estimate >> > > > > > > earliest >> > > > > > > >> > > message timestamp for un-compacted log segments because >> > the >> > > > > > deletion >> > > > > > > >> > > requests that belong to compacted segments have already >> > been >> > > > > > > >> processed". >> > > > > > > >> > > Not sure if it is correct. If a segment is compacted >> > before >> > > > user >> > > > > > > sends >> > > > > > > >> > > message to delete a key in this segment, it seems that >> we >> > > > still >> > > > > > need >> > > > > > > >> to >> > > > > > > >> > > ensure that the segment will be compacted again within >> the >> > > > given >> > > > > > > time >> > > > > > > >> > after >> > > > > > > >> > > the deletion is requested, right? >> > > > > > > >> > > >> > > > > > > >> > > Thanks, >> > > > > > > >> > > Dong >> > > > > > > >> > > >> > > > > > > >> > > On Thu, Aug 16, 2018 at 10:27 AM, xiongqi wu < >> > > > xiongq...@gmail.com >> > > > > > > >> > > > > > > >> > wrote: >> > > > > > > >> > > >> > > > > > > >> > > > Hi Xiaohe, >> > > > > > > >> > > > >> > > > > > > >> > > > Quick note: >> > > > > > > >> > > > 1) Use minimum of segment.ms and >> max.compaction.lag.ms >> > > > > > > >> > > > <http://max.compaction.ms >> > > > > > > <http://max.compaction.ms> >> > > > > > > >> > <http://max.compaction.ms >> > > > > > > <http://max.compaction.ms>>> >> > > > > > > >> > > > >> > > > > > > >> > > > 2) I am not sure if I get your second question. >> first, >> > we >> > > > have >> > > > > > > >> jitter >> > > > > > > >> > > when >> > > > > > > >> > > > we roll the active segment. second, on each >> compaction, >> > we >> > > > > > compact >> > > > > > > >> upto >> > > > > > > >> > > > the offsetmap could allow. Those will not lead to >> > perfect >> > > > > > > compaction >> > > > > > > >> > > storm >> > > > > > > >> > > > overtime. In addition, I expect we are setting >> > > > > > > >> max.compaction.lag.ms >> > > > > > > >> > on >> > > > > > > >> > > > the order of days. >> > > > > > > >> > > > >> > > > > > > >> > > > 3) I don't have access to the confluent community >> slack >> > > for >> > > > > > now. I >> > > > > > > >> am >> > > > > > > >> > > > reachable via the google handle out. >> > > > > > > >> > > > To avoid the double effort, here is my plan: >> > > > > > > >> > > > a) Collect more feedback and feature requriement on >> the >> > > KIP. >> > > > > > > >> > > > b) Wait unitl this KIP is approved. >> > > > > > > >> > > > c) I will address any additional requirements in the >> > > > > > > implementation. >> > > > > > > >> > (My >> > > > > > > >> > > > current implementation only complies to whatever >> > described >> > > > in >> > > > > > the >> > > > > > > >> KIP >> > > > > > > >> > > now) >> > > > > > > >> > > > d) I can share the code with the you and community >> see >> > you >> > > > want >> > > > > > to >> > > > > > > >> add >> > > > > > > >> > > > anything. >> > > > > > > >> > > > e) submission through committee >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > On Wed, Aug 15, 2018 at 11:42 PM, XIAOHE DONG < >> > > > > > > >> dannyriv...@gmail.com> >> > > > > > > >> > > > wrote: >> > > > > > > >> > > > >> > > > > > > >> > > > > Hi Xiongqi >> > > > > > > >> > > > > >> > > > > > > >> > > > > Thanks for thinking about implementing this as >> well. >> > :) >> > > > > > > >> > > > > >> > > > > > > >> > > > > I was thinking about using `segment.ms` to trigger >> > the >> > > > > > segment >> > > > > > > >> roll. >> > > > > > > >> > > > > Also, its value can be the largest time bias for >> the >> > > > record >> > > > > > > >> deletion. >> > > > > > > >> > > For >> > > > > > > >> > > > > example, if the `segment.ms` is 1 day and ` >> > > > max.compaction.ms` >> > > > > > > is >> > > > > > > >> 30 >> > > > > > > >> > > > days, >> > > > > > > >> > > > > the compaction may happen around 31 days. >> > > > > > > >> > > > > >> > > > > > > >> > > > > For my curiosity, is there a way we can do some >> > > > performance >> > > > > > test >> > > > > > > >> for >> > > > > > > >> > > this >> > > > > > > >> > > > > and any tools you can recommend. As you know, >> > > previously, >> > > > it >> > > > > > is >> > > > > > > >> > cleaned >> > > > > > > >> > > > up >> > > > > > > >> > > > > by respecting dirty ratio, but now it may happen >> > anytime >> > > > if >> > > > > > max >> > > > > > > >> lag >> > > > > > > >> > has >> > > > > > > >> > > > > passed for each message. I wonder what would >> happen if >> > > > clients >> > > > > > > >> send >> > > > > > > >> > > huge >> > > > > > > >> > > > > amount of tombstone records at the same time. >> > > > > > > >> > > > > >> > > > > > > >> > > > > I am looking forward to have a quick chat with you >> to >> > > > avoid >> > > > > > > double >> > > > > > > >> > > effort >> > > > > > > >> > > > > on this. I am in confluent community slack during >> the >> > > work >> > > > > > time. >> > > > > > > >> My >> > > > > > > >> > > name >> > > > > > > >> > > > is >> > > > > > > >> > > > > Xiaohe Dong. :) >> > > > > > > >> > > > > >> > > > > > > >> > > > > Rgds >> > > > > > > >> > > > > Xiaohe Dong >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > > On 2018/08/16 01:22:22, xiongqi wu < >> > xiongq...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > Brett, >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > Thank you for your comments. >> > > > > > > >> > > > > > I was thinking since we already has immediate >> > > compaction >> > > > > > > >> setting by >> > > > > > > >> > > > > setting >> > > > > > > >> > > > > > min dirty ratio to 0, so I decide to use "0" as >> > > disabled >> > > > > > > state. >> > > > > > > >> > > > > > I am ok to go with -1(disable), 0 (immediate) >> > options. >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > For the implementation, there are a few >> differences >> > > > between >> > > > > > > mine >> > > > > > > >> > and >> > > > > > > >> > > > > > "Xiaohe Dong"'s : >> > > > > > > >> > > > > > 1) I used the estimated creation time of a log >> > segment >> > > > > > instead >> > > > > > > >> of >> > > > > > > >> > > > largest >> > > > > > > >> > > > > > timestamp of a log to determine the compaction >> > > > eligibility, >> > > > > > > >> > because a >> > > > > > > >> > > > log >> > > > > > > >> > > > > > segment might stay as an active segment up to >> "max >> > > > > > compaction >> > > > > > > >> lag". >> > > > > > > >> > > > (see >> > > > > > > >> > > > > > the KIP for detail). >> > > > > > > >> > > > > > 2) I measure how much bytes that we must clean to >> > > > follow the >> > > > > > > >> "max >> > > > > > > >> > > > > > compaction lag" rule, and use that to determine >> the >> > > > order of >> > > > > > > >> > > > compaction. >> > > > > > > >> > > > > > 3) force active segment to roll to follow the >> "max >> > > > > > compaction >> > > > > > > >> lag" >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > I can share my code so we can coordinate. >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > I haven't think about a new API to force a >> > compaction. >> > > > what >> > > > > > is >> > > > > > > >> the >> > > > > > > >> > > use >> > > > > > > >> > > > > case >> > > > > > > >> > > > > > for this one? >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > On Wed, Aug 15, 2018 at 5:33 PM, Brett Rann >> > > > > > > >> > > <br...@zendesk.com.invalid >> > > > > > > >> > > > > >> > > > > > > >> > > > > > wrote: >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > > We've been looking into this too. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Mailing list: >> > > > > > > >> > > > > > > https://lists.apache.org/thread.html/ >> > > > > > > <https://lists.apache.org/thread.html/> >> > > > > > > >> > <https://lists.apache.org/thread.html/ >> > > > > > > <https://lists.apache.org/thread.html/>> >> > > > > > > >> > > ed7f6a6589f94e8c2a705553f364ef >> > > > > > > >> > > > > > > 599cb6915e4c3ba9b561e610e4@% >> > 3Cdev.kafka.apache.org >> > > %3E >> > > > > > > >> > > > > > > jira wish: >> > > > > > https://issues.apache.org/jira/browse/KAFKA-7137 >> > > > > > > <https://issues.apache.org/jira/browse/KAFKA-7137> >> > > > > > > >> > <https://issues.apache.org/jira/browse/KAFKA-7137 >> > > > > > > <https://issues.apache.org/jira/browse/KAFKA-7137>> >> > > > > > > >> > > > > > > confluent slack discussion: >> > > > > > > >> > > > > > > >> > > > https://confluentcommunity.slack.com/archives/C49R61XMM/ >> > > > > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/> >> > > > > > > >> > < >> https://confluentcommunity.slack.com/archives/C49R61XMM/ >> > > > > > > <https://confluentcommunity.slack.com/archives/C49R61XMM/>> >> > > > > > > >> > > > > p1530760121000039 >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > A person on my team has started on code so you >> > might >> > > > want >> > > > > > to >> > > > > > > >> > > > > coordinate: >> > > > > > > >> > > > > > > >> > > > https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log- >> > > > > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log-> >> > > > > > > >> > < >> https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log- >> > > > > > > <https://github.com/dongxiaohe/kafka/tree/dongxiaohe/log->> >> > > > > > > >> > > > > > > cleaner-compaction-max-lifetime-2.0 >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > He's been working with Jason Gustafson and >> James >> > > Chen >> > > > > > around >> > > > > > > >> the >> > > > > > > >> > > > > changes. >> > > > > > > >> > > > > > > You can ping him on confluent slack as Xiaohe >> > Dong. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > It's great to know others are thinking on it as >> > > well. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > You've added the requirement to force a segment >> > roll >> > > > which >> > > > > > > we >> > > > > > > >> > > hadn't >> > > > > > > >> > > > > gotten >> > > > > > > >> > > > > > > to yet, which is great. I was content with it >> not >> > > > > > including >> > > > > > > >> the >> > > > > > > >> > > > active >> > > > > > > >> > > > > > > segment. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > Adding topic level configuration " >> > > > max.compaction.lag.ms >> > > > > > ", >> > > > > > > >> and >> > > > > > > >> > > > > > > corresponding broker configuration " >> > > > > > > >> > log.cleaner.max.compaction.la >> > > > > > > >> > > > g.ms >> > > > > > > >> > > > > ", >> > > > > > > >> > > > > > > which is set to 0 (disabled) by default. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Glancing at some other settings convention >> seems >> > to >> > > > me to >> > > > > > be >> > > > > > > >> -1 >> > > > > > > >> > for >> > > > > > > >> > > > > > > disabled (or infinite, which is more meaningful >> > > > here). 0 >> > > > > > to >> > > > > > > me >> > > > > > > >> > > > implies >> > > > > > > >> > > > > > > instant, a little quicker than 1. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > We've been trying to think about a way to >> trigger >> > > > > > compaction >> > > > > > > >> as >> > > > > > > >> > > well >> > > > > > > >> > > > > > > through an API call, which would need to be >> > flagged >> > > > > > > somewhere >> > > > > > > >> (ZK >> > > > > > > >> > > > > admin/ >> > > > > > > >> > > > > > > space?) but we're struggling to think how that >> > would >> > > > be >> > > > > > > >> > coordinated >> > > > > > > >> > > > > across >> > > > > > > >> > > > > > > brokers and partitions. Have you given any >> thought >> > > to >> > > > > > that? >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Thu, Aug 16, 2018 at 8:44 AM xiongqi wu < >> > > > > > > >> xiongq...@gmail.com> >> > > > > > > >> > > > > wrote: >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > Eno, Dong, >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > I have updated the KIP. We decide not to >> address >> > > the >> > > > > > issue >> > > > > > > >> that >> > > > > > > >> > > we >> > > > > > > >> > > > > might >> > > > > > > >> > > > > > > > have for both compaction and time retention >> > > enabled >> > > > > > topics >> > > > > > > >> (see >> > > > > > > >> > > the >> > > > > > > >> > > > > > > > rejected alternative item 2). This KIP will >> only >> > > > ensure >> > > > > > > log >> > > > > > > >> can >> > > > > > > >> > > be >> > > > > > > >> > > > > > > > compacted after a specified time-interval. >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > As suggested by Dong, we will also enforce " >> > > > > > > >> > > max.compaction.lag.ms" >> > > > > > > >> > > > > is >> > > > > > > >> > > > > > > not >> > > > > > > >> > > > > > > > less than "min.compaction.lag.ms". >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354> >> > > > > > > >> > < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>> >> > > > > > > >> > > > > Time-based >> > > > > > > >> > > > > > > log >> > > > > > > >> > > > > > > > compaction policy >> > > > > > > >> > > > > > > > < >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354> >> > > > > > > >> > < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-354 >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-354>> >> > > > > > > >> > > > > Time-based >> > > > > > > >> > > > > > > log compaction policy> >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > On Tue, Aug 14, 2018 at 5:01 PM, xiongqi wu < >> > > > > > > >> > xiongq...@gmail.com >> > > > > > > >> > > > >> > > > > > > >> > > > > wrote: >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > Per discussion with Dong, he made a very >> good >> > > > point >> > > > > > that >> > > > > > > >> if >> > > > > > > >> > > > > compaction >> > > > > > > >> > > > > > > > > and time based retention are both enabled >> on a >> > > > topic, >> > > > > > > the >> > > > > > > >> > > > > compaction >> > > > > > > >> > > > > > > > might >> > > > > > > >> > > > > > > > > prevent records from being deleted on time. >> > The >> > > > reason >> > > > > > > is >> > > > > > > >> > when >> > > > > > > >> > > > > > > compacting >> > > > > > > >> > > > > > > > > multiple segments into one single segment, >> the >> > > > newly >> > > > > > > >> created >> > > > > > > >> > > > > segment >> > > > > > > >> > > > > > > will >> > > > > > > >> > > > > > > > > have same lastmodified timestamp as latest >> > > > original >> > > > > > > >> segment. >> > > > > > > >> > We >> > > > > > > >> > > > > lose >> > > > > > > >> > > > > > > the >> > > > > > > >> > > > > > > > > timestamp of all original segments except >> the >> > > last >> > > > > > one. >> > > > > > > >> As a >> > > > > > > >> > > > > result, >> > > > > > > >> > > > > > > > > records might not be deleted as it should >> be >> > > > through >> > > > > > > time >> > > > > > > >> > based >> > > > > > > >> > > > > > > > retention. >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > With the current KIP proposal, if we want >> to >> > > > ensure >> > > > > > > timely >> > > > > > > >> > > > > deletion, we >> > > > > > > >> > > > > > > > > have the following configurations: >> > > > > > > >> > > > > > > > > 1) enable time based log compaction only : >> > > > deletion is >> > > > > > > >> done >> > > > > > > >> > > > though >> > > > > > > >> > > > > > > > > overriding the same key >> > > > > > > >> > > > > > > > > 2) enable time based log retention only: >> > > deletion >> > > > is >> > > > > > > done >> > > > > > > >> > > though >> > > > > > > >> > > > > > > > > time-based retention >> > > > > > > >> > > > > > > > > 3) enable both log compaction and time >> based >> > > > > > retention: >> > > > > > > >> > > Deletion >> > > > > > > >> > > > > is not >> > > > > > > >> > > > > > > > > guaranteed. >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > Not sure if we have use case 3 and also >> want >> > > > deletion >> > > > > > to >> > > > > > > >> > happen >> > > > > > > >> > > > on >> > > > > > > >> > > > > > > time. >> > > > > > > >> > > > > > > > > There are several options to address >> deletion >> > > > issue >> > > > > > when >> > > > > > > >> > enable >> > > > > > > >> > > > > both >> > > > > > > >> > > > > > > > > compaction and retention: >> > > > > > > >> > > > > > > > > A) During log compaction, looking into >> record >> > > > > > timestamp >> > > > > > > to >> > > > > > > >> > > delete >> > > > > > > >> > > > > > > expired >> > > > > > > >> > > > > > > > > records. This can be done in compaction >> logic >> > > > itself >> > > > > > or >> > > > > > > >> use >> > > > > > > >> > > > > > > > > AdminClient.deleteRecords() . But this >> assumes >> > > we >> > > > have >> > > > > > > >> record >> > > > > > > >> > > > > > > timestamp. >> > > > > > > >> > > > > > > > > B) retain the lastModifed time of original >> > > > segments >> > > > > > > during >> > > > > > > >> > log >> > > > > > > >> > > > > > > > compaction. >> > > > > > > >> > > > > > > > > This requires extra meta data to record the >> > > > > > information >> > > > > > > or >> > > > > > > >> > not >> > > > > > > >> > > > > grouping >> > > > > > > >> > > > > > > > > multiple segments into one during >> compaction. >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > If we have use case 3 in general, I would >> > prefer >> > > > > > > solution >> > > > > > > >> A >> > > > > > > >> > and >> > > > > > > >> > > > > rely on >> > > > > > > >> > > > > > > > > record timestamp. >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > Two questions: >> > > > > > > >> > > > > > > > > Do we have use case 3? Is it nice to have >> or >> > > must >> > > > > > have? >> > > > > > > >> > > > > > > > > If we have use case 3 and want to go with >> > > > solution A, >> > > > > > > >> should >> > > > > > > >> > we >> > > > > > > >> > > > > > > introduce >> > > > > > > >> > > > > > > > > a new configuration to enforce deletion by >> > > > timestamp? >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > On Tue, Aug 14, 2018 at 1:52 PM, xiongqi >> wu < >> > > > > > > >> > > xiongq...@gmail.com >> > > > > > > >> > > > > >> > > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > >> Dong, >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Thanks for the comment. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> There are two retention policy: log >> > compaction >> > > > and >> > > > > > time >> > > > > > > >> > based >> > > > > > > >> > > > > > > retention. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Log compaction: >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> we have use cases to keep infinite >> retention >> > > of a >> > > > > > topic >> > > > > > > >> > (only >> > > > > > > >> > > > > > > > >> compaction). GDPR cares about deletion of >> PII >> > > > > > (personal >> > > > > > > >> > > > > identifiable >> > > > > > > >> > > > > > > > >> information) data. >> > > > > > > >> > > > > > > > >> Since Kafka doesn't know what records >> contain >> > > > PII, it >> > > > > > > >> relies >> > > > > > > >> > > on >> > > > > > > >> > > > > upper >> > > > > > > >> > > > > > > > >> layer to delete those records. >> > > > > > > >> > > > > > > > >> For those infinite retention uses uses, >> kafka >> > > > needs >> > > > > > to >> > > > > > > >> > > provide a >> > > > > > > >> > > > > way >> > > > > > > >> > > > > > > to >> > > > > > > >> > > > > > > > >> enforce compaction on time. This is what >> we >> > try >> > > > to >> > > > > > > >> address >> > > > > > > >> > in >> > > > > > > >> > > > this >> > > > > > > >> > > > > > > KIP. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Time based retention, >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> There are also use cases that users of >> Kafka >> > > > might >> > > > > > want >> > > > > > > >> to >> > > > > > > >> > > > expire >> > > > > > > >> > > > > all >> > > > > > > >> > > > > > > > >> their data. >> > > > > > > >> > > > > > > > >> In those cases, they can use time based >> > > > retention of >> > > > > > > >> their >> > > > > > > >> > > > topics. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Regarding your first question, if a user >> > wants >> > > to >> > > > > > > delete >> > > > > > > >> a >> > > > > > > >> > key >> > > > > > > >> > > > in >> > > > > > > >> > > > > the >> > > > > > > >> > > > > > > > >> log compaction topic, the user has to >> send a >> > > > deletion >> > > > > > > >> using >> > > > > > > >> > > the >> > > > > > > >> > > > > same >> > > > > > > >> > > > > > > > key. >> > > > > > > >> > > > > > > > >> Kafka only makes sure the deletion will >> > happen >> > > > under >> > > > > > a >> > > > > > > >> > certain >> > > > > > > >> > > > > time >> > > > > > > >> > > > > > > > >> periods (like 2 days/7 days). >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Regarding your second question. In most >> > cases, >> > > we >> > > > > > might >> > > > > > > >> want >> > > > > > > >> > > to >> > > > > > > >> > > > > delete >> > > > > > > >> > > > > > > > >> all duplicated keys at the same time. >> > > > > > > >> > > > > > > > >> Compaction might be more efficient since >> we >> > > need >> > > > to >> > > > > > > scan >> > > > > > > >> the >> > > > > > > >> > > log >> > > > > > > >> > > > > and >> > > > > > > >> > > > > > > > find >> > > > > > > >> > > > > > > > >> all duplicates. However, the expected use >> > case >> > > > is to >> > > > > > > set >> > > > > > > >> the >> > > > > > > >> > > > time >> > > > > > > >> > > > > > > based >> > > > > > > >> > > > > > > > >> compaction interval on the order of days, >> and >> > > be >> > > > > > larger >> > > > > > > >> than >> > > > > > > >> > > > 'min >> > > > > > > >> > > > > > > > >> compaction lag". We don't want log >> compaction >> > > to >> > > > > > happen >> > > > > > > >> > > > frequently >> > > > > > > >> > > > > > > since >> > > > > > > >> > > > > > > > >> it is expensive. The purpose is to help >> low >> > > > > > production >> > > > > > > >> rate >> > > > > > > >> > > > topic >> > > > > > > >> > > > > to >> > > > > > > >> > > > > > > get >> > > > > > > >> > > > > > > > >> compacted on time. For the topic with >> > "normal" >> > > > > > incoming >> > > > > > > >> > > message >> > > > > > > >> > > > > > > message >> > > > > > > >> > > > > > > > >> rate, the "min dirty ratio" might have >> > > triggered >> > > > the >> > > > > > > >> > > compaction >> > > > > > > >> > > > > before >> > > > > > > >> > > > > > > > this >> > > > > > > >> > > > > > > > >> time based compaction policy takes effect. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> Eno, >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> For your question, like I mentioned we >> have >> > > long >> > > > time >> > > > > > > >> > > retention >> > > > > > > >> > > > > use >> > > > > > > >> > > > > > > case >> > > > > > > >> > > > > > > > >> for log compacted topic, but we want to >> > provide >> > > > > > ability >> > > > > > > >> to >> > > > > > > >> > > > delete >> > > > > > > >> > > > > > > > certain >> > > > > > > >> > > > > > > > >> PII records on time. >> > > > > > > >> > > > > > > > >> Kafka itself doesn't know whether a record >> > > > contains >> > > > > > > >> > sensitive >> > > > > > > >> > > > > > > > information >> > > > > > > >> > > > > > > > >> and relies on the user for deletion. >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin >> < >> > > > > > > >> > > lindon...@gmail.com> >> > > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >>> Hey Xiongqi, >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> Thanks for the KIP. I have two questions >> > > > regarding >> > > > > > the >> > > > > > > >> > > use-case >> > > > > > > >> > > > > for >> > > > > > > >> > > > > > > > >>> meeting >> > > > > > > >> > > > > > > > >>> GDPR requirement. >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> 1) If I recall correctly, one of the GDPR >> > > > > > requirement >> > > > > > > is >> > > > > > > >> > that >> > > > > > > >> > > > we >> > > > > > > >> > > > > can >> > > > > > > >> > > > > > > > not >> > > > > > > >> > > > > > > > >>> keep messages longer than e.g. 30 days in >> > > > storage >> > > > > > > (e.g. >> > > > > > > >> > > Kafka). >> > > > > > > >> > > > > Say >> > > > > > > >> > > > > > > > there >> > > > > > > >> > > > > > > > >>> exists a partition p0 which contains >> > message1 >> > > > with >> > > > > > > key1 >> > > > > > > >> and >> > > > > > > >> > > > > message2 >> > > > > > > >> > > > > > > > with >> > > > > > > >> > > > > > > > >>> key2. And then user keeps producing >> messages >> > > > with >> > > > > > > >> key=key2 >> > > > > > > >> > to >> > > > > > > >> > > > > this >> > > > > > > >> > > > > > > > >>> partition. Since message1 with key1 is >> never >> > > > > > > overridden, >> > > > > > > >> > > sooner >> > > > > > > >> > > > > or >> > > > > > > >> > > > > > > > later >> > > > > > > >> > > > > > > > >>> we >> > > > > > > >> > > > > > > > >>> will want to delete message1 and keep the >> > > latest >> > > > > > > message >> > > > > > > >> > with >> > > > > > > >> > > > > > > key=key2. >> > > > > > > >> > > > > > > > >>> But >> > > > > > > >> > > > > > > > >>> currently it looks like log compact >> logic in >> > > > Kafka >> > > > > > > will >> > > > > > > >> > > always >> > > > > > > >> > > > > put >> > > > > > > >> > > > > > > > these >> > > > > > > >> > > > > > > > >>> messages in the same segment. Will this >> be >> > an >> > > > issue? >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> 2) The current KIP intends to provide the >> > > > capability >> > > > > > > to >> > > > > > > >> > > delete >> > > > > > > >> > > > a >> > > > > > > >> > > > > > > given >> > > > > > > >> > > > > > > > >>> message in log compacted topic. Does such >> > > > use-case >> > > > > > > also >> > > > > > > >> > > require >> > > > > > > >> > > > > Kafka >> > > > > > > >> > > > > > > > to >> > > > > > > >> > > > > > > > >>> keep the messages produced before the >> given >> > > > message? >> > > > > > > If >> > > > > > > >> > yes, >> > > > > > > >> > > > > then we >> > > > > > > >> > > > > > > > can >> > > > > > > >> > > > > > > > >>> probably just use >> > AdminClient.deleteRecords() >> > > or >> > > > > > > >> time-based >> > > > > > > >> > > log >> > > > > > > >> > > > > > > > retention >> > > > > > > >> > > > > > > > >>> to meet the use-case requirement. If no, >> do >> > > you >> > > > know >> > > > > > > >> what >> > > > > > > >> > is >> > > > > > > >> > > > the >> > > > > > > >> > > > > > > GDPR's >> > > > > > > >> > > > > > > > >>> requirement on time-to-deletion after >> user >> > > > > > explicitly >> > > > > > > >> > > requests >> > > > > > > >> > > > > the >> > > > > > > >> > > > > > > > >>> deletion >> > > > > > > >> > > > > > > > >>> (e.g. 1 hour, 1 day, 7 day)? >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> Thanks, >> > > > > > > >> > > > > > > > >>> Dong >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> On Mon, Aug 13, 2018 at 3:44 PM, xiongqi >> wu >> > < >> > > > > > > >> > > > xiongq...@gmail.com >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > > > wrote: >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >>> > Hi Eno, >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > The GDPR request we are getting here at >> > > > linkedin >> > > > > > is >> > > > > > > >> if we >> > > > > > > >> > > > get a >> > > > > > > >> > > > > > > > >>> request to >> > > > > > > >> > > > > > > > >>> > delete a record through a null key on a >> > log >> > > > > > > compacted >> > > > > > > >> > > topic, >> > > > > > > >> > > > > > > > >>> > we want to delete the record via >> > compaction >> > > > in a >> > > > > > > given >> > > > > > > >> > time >> > > > > > > >> > > > > period >> > > > > > > >> > > > > > > > >>> like 2 >> > > > > > > >> > > > > > > > >>> > days (whatever is required by the >> policy). >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > There might be other issues (such as >> > orphan >> > > > log >> > > > > > > >> segments >> > > > > > > >> > > > under >> > > > > > > >> > > > > > > > certain >> > > > > > > >> > > > > > > > >>> > conditions) that lead to GDPR problem >> but >> > > > they are >> > > > > > > >> more >> > > > > > > >> > > like >> > > > > > > >> > > > > > > > >>> something we >> > > > > > > >> > > > > > > > >>> > need to fix anyway regardless of GDPR. >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > -- Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > On Mon, Aug 13, 2018 at 2:56 PM, Eno >> > > Thereska >> > > > < >> > > > > > > >> > > > > > > > eno.there...@gmail.com> >> > > > > > > >> > > > > > > > >>> > wrote: >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> > > Hello, >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > Thanks for the KIP. I'd like to see a >> > more >> > > > > > precise >> > > > > > > >> > > > > definition of >> > > > > > > >> > > > > > > > what >> > > > > > > >> > > > > > > > >>> > part >> > > > > > > >> > > > > > > > >>> > > of GDPR you are targeting as well as >> > some >> > > > sort >> > > > > > of >> > > > > > > >> > > > > verification >> > > > > > > >> > > > > > > that >> > > > > > > >> > > > > > > > >>> this >> > > > > > > >> > > > > > > > >>> > > KIP actually addresses the problem. >> > Right >> > > > now I >> > > > > > > find >> > > > > > > >> > > this a >> > > > > > > >> > > > > bit >> > > > > > > >> > > > > > > > >>> vague: >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > "Ability to delete a log message >> through >> > > > > > > compaction >> > > > > > > >> in >> > > > > > > >> > a >> > > > > > > >> > > > > timely >> > > > > > > >> > > > > > > > >>> manner >> > > > > > > >> > > > > > > > >>> > has >> > > > > > > >> > > > > > > > >>> > > become an important requirement in >> some >> > > use >> > > > > > cases >> > > > > > > >> > (e.g., >> > > > > > > >> > > > > GDPR)" >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > Is there any guarantee that after >> this >> > KIP >> > > > the >> > > > > > > GDPR >> > > > > > > >> > > problem >> > > > > > > >> > > > > is >> > > > > > > >> > > > > > > > >>> solved or >> > > > > > > >> > > > > > > > >>> > do >> > > > > > > >> > > > > > > > >>> > > we need to do something else as well, >> > > e.g., >> > > > more >> > > > > > > >> KIPs? >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > Thanks >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > Eno >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > On Thu, Aug 9, 2018 at 4:18 PM, >> xiongqi >> > > wu < >> > > > > > > >> > > > > xiongq...@gmail.com> >> > > > > > > >> > > > > > > > >>> wrote: >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > > > Hi Kafka, >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > > This KIP tries to address GDPR >> concern >> > > to >> > > > > > > fulfill >> > > > > > > >> > > > deletion >> > > > > > > >> > > > > > > > request >> > > > > > > >> > > > > > > > >>> on >> > > > > > > >> > > > > > > > >>> > > time >> > > > > > > >> > > > > > > > >>> > > > through time-based log compaction >> on a >> > > > > > > compaction >> > > > > > > >> > > enabled >> > > > > > > >> > > > > > > topic: >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-> >> > > > > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->> >> > > > > > > >> > > > > > > > < >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-> >> > > > > > > >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP->>> >> > > > > > > >> > > > > > > > >>> > > > >> > 354%3A+Time-based+log+compaction+policy >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > > Any feedback will be appreciated. >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > > > >>> > > > >> > > > > > > >> > > > > > > > >>> > > >> > > > > > > >> > > > > > > > >>> > >> > > > > > > >> > > > > > > > >>> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > >> -- >> > > > > > > >> > > > > > > > >> Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > > > >> >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > -- >> > > > > > > >> > > > > > > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > -- >> > > > > > > >> > > > > > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Brett Rann >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Senior DevOps Engineer >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Zendesk International Ltd >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > 395 Collins Street, Melbourne VIC 3000 >> Australia >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Mobile: +61 (0) 418 826 017 >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > -- >> > > > > > > >> > > > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > >> > > > > > > >> > > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > -- >> > > > > > > >> > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > >> > > > > > > >> > > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > -- >> > > > > > > >> > Xiongqi (Wesley) Wu >> > > > > > > >> > >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> -- >> > > > > > > >> >> > > > > > > >> Brett Rann >> > > > > > > >> >> > > > > > > >> Senior DevOps Engineer >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> Zendesk International Ltd >> > > > > > > >> >> > > > > > > >> 395 Collins Street, Melbourne VIC 3000 Australia >> > > > > > > >> >> > > > > > > >> Mobile: +61 (0) 418 826 017 >> > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Xiongqi (Wesley) Wu >> > > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > >> > > > > > Brett Rann >> > > > > > >> > > > > > Senior DevOps Engineer >> > > > > > >> > > > > > >> > > > > > Zendesk International Ltd >> > > > > > >> > > > > > 395 Collins Street, Melbourne VIC 3000 Australia >> > > > > > >> > > > > > Mobile: +61 (0) 418 826 017 >> > > > > > >> > > > >> > > >> > > >> > > -- >> > > -Regards, >> > > Mayuresh R. Gharat >> > > (862) 250-7125 >> > > >> > >> >