Re: [DISCUSS] KIP-280: Enhanced log compaction

Eric Azama Mon, 04 Nov 2019 11:31:45 -0800

Hi Senthilnathan,

Regarding Matthias's point 6, what is the reasoning for choosing the first
occurrence of the configured header? I believe this corresponds to the
oldest value for given key. If there are multiple values for a key, it
seems more intuitive that the newest value is the one that should be used
for compaction.


Thanks,
Eric

On Mon, Nov 4, 2019 at 11:00 AM Guozhang Wang <[email protected]> wrote:

> Hello Senthilnathan,
>
> Thanks for revamping on the KIP. I have only one comment about the wiki
> otherwise LGTM.
>
> 1. We should emphasize that the newly introduced config yields to the
> existing "log.cleanup.policy", i.e. if the latter's value is `delete` not
> `compact`, then the previous config would be ignored.
>
>
> Guozhang
>
> On Mon, Nov 4, 2019 at 9:52 AM Senthilnathan Muthusamy
> <[email protected]> wrote:
>
> > Hi all,
> >
> > I will start the vote thread shortly for this updated KIP. If there are
> > any more thoughts I would love to hear them.
> >
> > Thanks,
> > Senthil
> >
> > -----Original Message-----
> > From: Senthilnathan Muthusamy <[email protected]>
> > Sent: Thursday, October 31, 2019 3:51 AM
> > To: [email protected]
> > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Hi Matthias
> >
> > Thanks for the response.
> >
> > (1) Yes
> >
> > (2) Yes, and the config name will be the same (i.e.
> > `log.cleaner.compaction.strategy` &
> > `log.cleaner.compaction.strategy.header`) at broker level and topic level
> > (to override broker level default compact strategy). Please let me know
> if
> > we need to keep it in different naming convention. Note: Broker level
> > (which will be in the server.properties) configuration is optional and
> > default it to offset. Topic level configuration will be default to broker
> > level config...
> >
> > (3) By this new way, it avoids another config parameter and also in
> > feature if any new strategy like header need addition info, no additional
> > config required. As this got discussed already and agreed to have
> separate
> > config, I will revert it. KIP updated...
> >
> > (4) Done
> >
> > (5) Updated
> >
> > (6) Updated to pick the first header in the list
> >
> > Please let me know if you have any other questions.
> >
> > Thanks,
> > Senthil
> >
> > -----Original Message-----
> > From: Matthias J. Sax <[email protected]>
> > Sent: Thursday, October 31, 2019 12:13 AM
> > To: [email protected]
> > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> >
> > Thanks for picking up this KIP, Senthil.
> >
> > (1) As far as I remember, the main issue of the original proposal was a
> > missing topic level configuration for the compaction strategy. With this
> > being addressed, I am in favor of this KIP.
> >
> > (2) With regard to (1), it seems we would need a new topic level config
> > `compaction.strategy`, and `log.cleaner.compaction.strategy` would be the
> > default strategy (ie, broker level config) if a topic does not overwrite
> it?
> >
> > (3) Why did you remove `log.cleaner.compaction.strategy.header`
> > parameter and change the accepted values of
> > `log.cleaner.compaction.strategy` to "header.<key>" instead of keeping
> > "header"? The original approach seems to be cleaner, and I think this was
> > discussed on the original discuss thread already.
> >
> > (4) Nit: For the "timestamp" compaction strategy you changed the KIP to
> >
> > -> `The record [create] timestamp`
> >
> > This is miss leading IMHO, because it depends on the broker/log
> > configuration `(log.)message.timestamp.type` that can either be
> > `CreateTime` or `LogAppendTime` what the actual record timestamp is. I
> > would just remove "create" to keep it unspecified.
> >
> > (5) Nit: the section "Public Interfaces" should list the newly introduced
> > configs -- configuration parameters are a public interface.
> >
> > (6) What do you mean by "first level header lookup"? The term "first
> > level" indicates some hierarchy, but headers don't have any hierarchy --
> > it's just a list of key-value pairs? If you mean the _order_ of the
> > headers, ie, pick the first header in the list that matches the key,
> please
> > rephrase it to make it clearer.
> >
> >
> >
> > @Tom: I agree with all you are saying, however, I still think that this
> > KIP will improve the overall situation, because everything you pointed
> out
> > is actually true with offset based compaction, too.
> >
> > The KIP is not a silver bullet that solves all issue for interleaved
> > writes, but I personally believe, it's a good improvement.
> >
> >
> >
> > -Matthias
> >
> >
> > On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> > > Hi,
> > >
> > > Please let me know if anyone has any questions on this updated
> KIP-280...
> > >
> > > Thanks,
> > >
> > > Senthil
> > >
> > > -----Original Message-----
> > > From: Senthilnathan Muthusamy <[email protected]>
> > > Sent: Monday, October 28, 2019 11:36 PM
> > > To: [email protected]
> > > Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Tom,
> > >
> > > Sorry for the delayed response.
> > >
> > > Regarding the fall back to offset decision for both timestamp & header
> > value is based on the previous author discuss
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Csenthilm%40microsoft.com%7Cb5c596140be1436e9fb708d75df04714%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637081159484181661&amp;sdata=%2Fap4F2CdPQe02wNDGkzjzIrxOQRTa2KraQE75dpjTzE%3D&amp;reserved=0
> > and as per the discussion, it is really required to avoid duplicates.
> > >
> > > And the timestamp strategy is from the original KIP author and we are
> > keeping it as is.
> > >
> > > Finally on the sequence order guarantee by the producer, it is not
> > feasible on waiting for ack in async / multi-threads/processes scenarios
> > and hence the header sequence based compact strategy with producer's
> > responsibility to have a unique sequence generation for the
> > topic-partition-key level.
> > >
> > > Hoping this clarifies all your questions. Please let us know if you
> have
> > any further questions.
> > >
> > > @Guozhang Wang / @Matthias J. Sax, I see you both had a detail
> > discussion on the original KIP with previous author and it would great to
> > hear your inputs as well.
> > >
> > > Thanks,
> > > Senthil
> > >
> > > -----Original Message-----
> > > From: Tom Bentley <[email protected]>
> > > Sent: Tuesday, October 22, 2019 2:32 AM
> > > To: [email protected]
> > > Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> > >
> > > Hi Senthilnathan,
> > >
> > > In the motivation isn't it a little misleading to say "On the producer
> > > side, we clearly preserve an order for the two messages, <K1, V1> <K1,
> > > V2>"? IMHO, the semantics of the producer are clear that having an
> > > V2>observed
> > > order of sending records from different producers is not sufficient to
> > guarantee ordering on the broker. You really need to send the 2nd record
> > only after the 1st record is acked. It's the difficultly of achieving
> that
> > in practice that's the true motivation for your KIP.
> > >
> > > I can see the attraction of using timestamps, but it would be helpful
> to
> > explain how that really solves the problem. When the producers are in
> > different processes on different machines you're relying on their clocks
> > being synchronized, which is a whole subject in itself. Even if they're
> > synchronized the resolution of System.currentTimeMillis() is typically
> many
> > milliseconds. If your producers are in different threads of the same
> > process that could be a real problem because it makes ties quite likely.
> > > And you don't explain why it's OK to resolve ties using the offset. The
> > basis of your argument is that the offset is giving you the wrong answer.
> > > So it seems to me that using it as a tiebreaker is just narrowing the
> > chances of getting the wrong answer. Maybe none of this matters for your
> > use case, but I think it should be spelled out in the KIP, because it
> > surely would matter for similar use cases.
> > >
> > > Using a sequence at least removes the problem of ties, but the
> > interesting bit is now in how you deal with races between
> threads/processes
> > in getting a sequence number allocated (which is out of scope of the
> KIP, I
> > guess).
> > > How is resolving that race any simpler that resolving the motivating
> > race by waiting for the ack of the first record sent?
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy <
> > [email protected]> wrote:
> > >
> > >> Hi All,
> > >>
> > >> We are bring back the KIP-280 to live with small correct for the
> > >> discussion & voting. Thanks to previous author Luis Cabral on the
> > >> KIP-280 initiation and we are taking over to complete and get it into
> > 2.4...
> > >>
> > >> Below is the correction that we made to the existing KIP-280:
> > >>
> > >>   *   Allowing the compact strategy configuration at the topic level
> as
> > >> the log compaction is at the topic level and a broker can have
> > >> multiple topics. This allows the flexibility to have the strategy at
> > >> both broker level (i.e. for all topics within the broker) and topic
> > >> level (i.e. for a subset of topics within a broker) as well...
> > >>
> > >> KIP-280:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi
> > >> k
> > >> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced
> > >> %
> > >> 2Blog%2Bcompaction&amp;data=02%7C01%7Csenthilm%40microsoft.com%7C686c
> > >> 3
> > >> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > >> 0
> > >> %7C637073341017520406&amp;sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7Gs
> > >> 6
> > >> XqJ%2F5SQ%3D&amp;reserved=0 PULL REQUEST:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> > >> h
> > >> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&amp;data=02%7C01%7Csenthilm%40m
> > >> i
> > >> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab
> > >> 2
> > >> d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=bt32PgDUjJjpXohEWp
> > >> t
> > >> Fxv6mPERCwcRFlVROzinBtnk%3D&amp;reserved=0 (unit test coverage in
> > >> progress)
> > >>
> > >> Previous Thread DISCUSS:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
> > >> t
> > >> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1afa
> > >> 5
> > >> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Cs
> > >> e
> > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf8
> > >> 6
> > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=XwcUWWY
> > >> D
> > >> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&amp;reserved=0
> > >> Previous Thread VOTE:
> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
> > >> t
> > >> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f76505834978
> > >> 1
> > >> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Cs
> > >> e
> > >> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf8
> > >> 6
> > >> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=8cKQcAm
> > >> 2
> > >> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&amp;reserved=0
> > >>
> > >> Appreciate your timely action.
> > >>
> > >> PS: Initiating a separate thread as I was not able to reply to the
> > >> existing threads...
> > >>
> > >> Thanks,
> > >> Senthil
> > >>
> >
> >
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-280: Enhanced log compaction

Reply via email to