Hi Guozhang, Sure and I have made a note in the JIRA item to make sure the wiki is updated.
Thanks, Senthil -----Original Message----- From: Guozhang Wang <wangg...@gmail.com> Sent: Monday, November 4, 2019 11:00 AM To: dev <dev@kafka.apache.org> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Hello Senthilnathan, Thanks for revamping on the KIP. I have only one comment about the wiki otherwise LGTM. 1. We should emphasize that the newly introduced config yields to the existing "log.cleanup.policy", i.e. if the latter's value is `delete` not `compact`, then the previous config would be ignored. Guozhang On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy <senth...@microsoft.com.invalid> wrote: > Hi all, > > I will start the vote thread shortly for this updated KIP. If there > are any more thoughts I would love to hear them. > > Thanks, > Senthil > > -----Original Message----- > From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID> > Sent: Thursday, October 31, 2019 3:51 AM > To: dev@kafka.apache.org > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction > > Hi Matthias > > Thanks for the response. > > (1) Yes > > (2) Yes, and the config name will be the same (i.e. > `log.cleaner.compaction.strategy` & > `log.cleaner.compaction.strategy.header`) at broker level and topic > level (to override broker level default compact strategy). Please let > me know if we need to keep it in different naming convention. Note: > Broker level (which will be in the server.properties) configuration is > optional and default it to offset. Topic level configuration will be > default to broker level config... > > (3) By this new way, it avoids another config parameter and also in > feature if any new strategy like header need addition info, no > additional config required. As this got discussed already and agreed > to have separate config, I will revert it. KIP updated... > > (4) Done > > (5) Updated > > (6) Updated to pick the first header in the list > > Please let me know if you have any other questions. > > Thanks, > Senthil > > -----Original Message----- > From: Matthias J. Sax <matth...@confluent.io> > Sent: Thursday, October 31, 2019 12:13 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > Thanks for picking up this KIP, Senthil. > > (1) As far as I remember, the main issue of the original proposal was > a missing topic level configuration for the compaction strategy. With > this being addressed, I am in favor of this KIP. > > (2) With regard to (1), it seems we would need a new topic level > config `compaction.strategy`, and `log.cleaner.compaction.strategy` > would be the default strategy (ie, broker level config) if a topic does not > overwrite it? > > (3) Why did you remove `log.cleaner.compaction.strategy.header` > parameter and change the accepted values of > `log.cleaner.compaction.strategy` to "header.<key>" instead of keeping > "header"? The original approach seems to be cleaner, and I think this > was discussed on the original discuss thread already. > > (4) Nit: For the "timestamp" compaction strategy you changed the KIP > to > > -> `The record [create] timestamp` > > This is miss leading IMHO, because it depends on the broker/log > configuration `(log.)message.timestamp.type` that can either be > `CreateTime` or `LogAppendTime` what the actual record timestamp is. I > would just remove "create" to keep it unspecified. > > (5) Nit: the section "Public Interfaces" should list the newly > introduced configs -- configuration parameters are a public interface. > > (6) What do you mean by "first level header lookup"? The term "first > level" indicates some hierarchy, but headers don't have any hierarchy > -- it's just a list of key-value pairs? If you mean the _order_ of the > headers, ie, pick the first header in the list that matches the key, > please rephrase it to make it clearer. > > > > @Tom: I agree with all you are saying, however, I still think that > this KIP will improve the overall situation, because everything you > pointed out is actually true with offset based compaction, too. > > The KIP is not a silver bullet that solves all issue for interleaved > writes, but I personally believe, it's a good improvement. > > > > -Matthias > > > On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote: > > Hi, > > > > Please let me know if anyone has any questions on this updated KIP-280... > > > > Thanks, > > > > Senthil > > > > -----Original Message----- > > From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID> > > Sent: Monday, October 28, 2019 11:36 PM > > To: dev@kafka.apache.org > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction > > > > Hi Tom, > > > > Sorry for the delayed response. > > > > Regarding the fall back to offset decision for both timestamp & > > header > value is based on the previous author discuss > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > s.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd > 02b86645782be67%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7Cse > nthilm%40microsoft.com%7C8f6cae776082459c793b08d761595294%7C72f988bf86 > f141af91ab2d7cd011db47%7C1%7C0%7C637085022516423400&sdata=WpEW5ylu > %2FsLMyGS2ULWDZ7vA1OzQwFYWSuioLCbABhM%3D&reserved=0 > and as per the discussion, it is really required to avoid duplicates. > > > > And the timestamp strategy is from the original KIP author and we > > are > keeping it as is. > > > > Finally on the sequence order guarantee by the producer, it is not > feasible on waiting for ack in async / multi-threads/processes > scenarios and hence the header sequence based compact strategy with > producer's responsibility to have a unique sequence generation for the > topic-partition-key level. > > > > Hoping this clarifies all your questions. Please let us know if you > > have > any further questions. > > > > @Guozhang Wang / @Matthias J. Sax, I see you both had a detail > discussion on the original KIP with previous author and it would great > to hear your inputs as well. > > > > Thanks, > > Senthil > > > > -----Original Message----- > > From: Tom Bentley <tbent...@redhat.com> > > Sent: Tuesday, October 22, 2019 2:32 AM > > To: dev@kafka.apache.org > > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction > > > > Hi Senthilnathan, > > > > In the motivation isn't it a little misleading to say "On the > > producer side, we clearly preserve an order for the two messages, > > <K1, V1> <K1, > > V2>"? IMHO, the semantics of the producer are clear that having an > > V2>observed > > order of sending records from different producers is not sufficient > > to > guarantee ordering on the broker. You really need to send the 2nd > record only after the 1st record is acked. It's the difficultly of > achieving that in practice that's the true motivation for your KIP. > > > > I can see the attraction of using timestamps, but it would be > > helpful to > explain how that really solves the problem. When the producers are in > different processes on different machines you're relying on their > clocks being synchronized, which is a whole subject in itself. Even if > they're synchronized the resolution of System.currentTimeMillis() is > typically many milliseconds. If your producers are in different > threads of the same process that could be a real problem because it makes > ties quite likely. > > And you don't explain why it's OK to resolve ties using the offset. > > The > basis of your argument is that the offset is giving you the wrong answer. > > So it seems to me that using it as a tiebreaker is just narrowing > > the > chances of getting the wrong answer. Maybe none of this matters for > your use case, but I think it should be spelled out in the KIP, > because it surely would matter for similar use cases. > > > > Using a sequence at least removes the problem of ties, but the > interesting bit is now in how you deal with races between > threads/processes in getting a sequence number allocated (which is out > of scope of the KIP, I guess). > > How is resolving that race any simpler that resolving the motivating > race by waiting for the ack of the first record sent? > > > > Kind regards, > > > > Tom > > > > On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy < > senth...@microsoft.com.invalid> wrote: > > > >> Hi All, > >> > >> We are bring back the KIP-280 to live with small correct for the > >> discussion & voting. Thanks to previous author Luis Cabral on the > >> KIP-280 initiation and we are taking over to complete and get it > >> into > 2.4... > >> > >> Below is the correction that we made to the existing KIP-280: > >> > >> * Allowing the compact strategy configuration at the topic level as > >> the log compaction is at the topic level and a broker can have > >> multiple topics. This allows the flexibility to have the strategy > >> at both broker level (i.e. for all topics within the broker) and > >> topic level (i.e. for a subset of topics within a broker) as well... > >> > >> KIP-280: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fc > >> wi > >> k > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanc > >> ed > >> % > >> 2Blog%2Bcompaction&data=02%7C01%7Csenthilm%40microsoft.com%7C68 > >> 6c > >> 3 > >> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1% > >> 7C > >> 0 > >> %7C637073341017520406&sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7 > >> Gs > >> 6 > >> XqJ%2F5SQ%3D&reserved=0 PULL REQUEST: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg > >> it > >> h > >> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&data=02%7C01%7Csenthilm%4 > >> 0m > >> i > >> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91 > >> ab > >> 2 > >> d7cd011db47%7C1%7C0%7C637073341017520406&sdata=bt32PgDUjJjpXohE > >> Wp > >> t > >> Fxv6mPERCwcRFlVROzinBtnk%3D&reserved=0 (unit test coverage in > >> progress) > >> > >> Previous Thread DISCUSS: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl > >> is > >> t > >> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1a > >> fa > >> 5 > >> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7 > >> Cs > >> e > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b > >> f8 > >> 6 > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&sdata=XwcUW > >> WY > >> D > >> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&reserved=0 > >> Previous Thread VOTE: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl > >> is > >> t > >> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f765058349 > >> 78 > >> 1 > >> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7 > >> Cs > >> e > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988b > >> f8 > >> 6 > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&sdata=8cKQc > >> Am > >> 2 > >> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&reserved=0 > >> > >> Appreciate your timely action. > >> > >> PS: Initiating a separate thread as I was not able to reply to the > >> existing threads... > >> > >> Thanks, > >> Senthil > >> > > -- -- Guozhang