Thanks for picking up this KIP, Senthil.

(1) As far as I remember, the main issue of the original proposal was a
missing topic level configuration for the compaction strategy. With this
being addressed, I am in favor of this KIP.

(2) With regard to (1), it seems we would need a new topic level config
`compaction.strategy`, and `log.cleaner.compaction.strategy` would be
the default strategy (ie, broker level config) if a topic does not
overwrite it?

(3) Why did you remove `log.cleaner.compaction.strategy.header`
parameter and change the accepted values of
`log.cleaner.compaction.strategy` to "header.<key>" instead of keeping
"header"? The original approach seems to be cleaner, and I think this
was discussed on the original discuss thread already.

(4) Nit: For the "timestamp" compaction strategy you changed the KIP to

-> `The record [create] timestamp`

This is miss leading IMHO, because it depends on the broker/log
configuration `(log.)message.timestamp.type` that can either be
`CreateTime` or `LogAppendTime` what the actual record timestamp is. I
would just remove "create" to keep it unspecified.

(5) Nit: the section "Public Interfaces" should list the newly
introduced configs -- configuration parameters are a public interface.

(6) What do you mean by "first level header lookup"? The term "first
level" indicates some hierarchy, but headers don't have any hierarchy --
it's just a list of key-value pairs? If you mean the _order_ of the
headers, ie, pick the first header in the list that matches the key,
please rephrase it to make it clearer.



@Tom: I agree with all you are saying, however, I still think that this
KIP will improve the overall situation, because everything you pointed
out is actually true with offset based compaction, too.

The KIP is not a silver bullet that solves all issue for interleaved
writes, but I personally believe, it's a good improvement.



-Matthias


On 10/30/19 9:45 AM, Senthilnathan Muthusamy wrote:
> Hi,
> 
> Please let me know if anyone has any questions on this updated KIP-280...
> 
> Thanks,
> 
> Senthil
> 
> -----Original Message-----
> From: Senthilnathan Muthusamy <senth...@microsoft.com.INVALID> 
> Sent: Monday, October 28, 2019 11:36 PM
> To: dev@kafka.apache.org
> Subject: RE: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Tom,
> 
> Sorry for the delayed response.
> 
> Regarding the fall back to offset decision for both timestamp & header value 
> is based on the previous author discuss 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Ff44317eb6cd34f91966654c80509d4a457dbbccdd02b86645782be67%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Csenthilm%40microsoft.com%7Cfce3eed73837437b5d6b08d75c3a4692%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079277707943399&amp;sdata=pTrTHHOD5KCtLM2YXSMWmPBMx%2BI9L5PeEe5QEcwzkKA%3D&amp;reserved=0
>  and as per the discussion, it is really required to avoid duplicates.
> 
> And the timestamp strategy is from the original KIP author and we are keeping 
> it as is.
> 
> Finally on the sequence order guarantee by the producer, it is not feasible 
> on waiting for ack in async / multi-threads/processes scenarios and hence the 
> header sequence based compact strategy with producer's responsibility to have 
> a unique sequence generation for the topic-partition-key level.
> 
> Hoping this clarifies all your questions. Please let us know if you have any 
> further questions.
> 
> @Guozhang Wang / @Matthias J. Sax, I see you both had a detail discussion on 
> the original KIP with previous author and it would great to hear your inputs 
> as well.
> 
> Thanks,
> Senthil
> 
> -----Original Message-----
> From: Tom Bentley <tbent...@redhat.com>
> Sent: Tuesday, October 22, 2019 2:32 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
> 
> Hi Senthilnathan,
> 
> In the motivation isn't it a little misleading to say "On the producer side, 
> we clearly preserve an order for the two messages, <K1, V1> <K1,
> V2>"? IMHO, the semantics of the producer are clear that having an 
> V2>observed
> order of sending records from different producers is not sufficient to 
> guarantee ordering on the broker. You really need to send the 2nd record only 
> after the 1st record is acked. It's the difficultly of achieving that in 
> practice that's the true motivation for your KIP.
> 
> I can see the attraction of using timestamps, but it would be helpful to 
> explain how that really solves the problem. When the producers are in 
> different processes on different machines you're relying on their clocks 
> being synchronized, which is a whole subject in itself. Even if they're 
> synchronized the resolution of System.currentTimeMillis() is typically many 
> milliseconds. If your producers are in different threads of the same process 
> that could be a real problem because it makes ties quite likely.
> And you don't explain why it's OK to resolve ties using the offset. The basis 
> of your argument is that the offset is giving you the wrong answer.
> So it seems to me that using it as a tiebreaker is just narrowing the chances 
> of getting the wrong answer. Maybe none of this matters for your use case, 
> but I think it should be spelled out in the KIP, because it surely would 
> matter for similar use cases.
> 
> Using a sequence at least removes the problem of ties, but the interesting 
> bit is now in how you deal with races between threads/processes in getting a 
> sequence number allocated (which is out of scope of the KIP, I guess).
> How is resolving that race any simpler that resolving the motivating race by 
> waiting for the ack of the first record sent?
> 
> Kind regards,
> 
> Tom
> 
> On Mon, Oct 21, 2019 at 9:06 PM Senthilnathan Muthusamy 
> <senth...@microsoft.com.invalid> wrote:
> 
>> Hi All,
>>
>> We are bring back the KIP-280 to live with small correct for the 
>> discussion & voting. Thanks to previous author Luis Cabral on the
>> KIP-280 initiation and we are taking over to complete and get it into 2.4...
>>
>> Below is the correction that we made to the existing KIP-280:
>>
>>   *   Allowing the compact strategy configuration at the topic level as
>> the log compaction is at the topic level and a broker can have 
>> multiple topics. This allows the flexibility to have the strategy at 
>> both broker level (i.e. for all topics within the broker) and topic 
>> level (i.e. for a subset of topics within a broker) as well...
>>
>> KIP-280:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
>> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-280%253A%2BEnhanced%
>> 2Blog%2Bcompaction&amp;data=02%7C01%7Csenthilm%40microsoft.com%7C686c3
>> 2fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
>> %7C637073341017520406&amp;sdata=KrRem2KWCBscHX963Ah8wZ%2Fj9dkhCeAa7Gs6
>> XqJ%2F5SQ%3D&amp;reserved=0 PULL REQUEST: 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> ub.com%2Fapache%2Fkafka%2Fpull%2F7528&amp;data=02%7C01%7Csenthilm%40mi
>> crosoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86f141af91ab2
>> d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=bt32PgDUjJjpXohEWpt
>> Fxv6mPERCwcRFlVROzinBtnk%3D&amp;reserved=0 (unit test coverage in
>> progress)
>>
>> Previous Thread DISCUSS:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.apache.org%2Fthread.html%2F79aa6e50d7c737ddf83455dd8063692a535a1afa5
>> 58620fe1a1496d3%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Cse
>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86
>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=XwcUWWYD
>> PV1nA%2BbkDGLFNlXZ5bysVblWUTDQEzAaKxM%3D&amp;reserved=0
>> Previous Thread VOTE:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.apache.org%2Fthread.html%2Fb2ecd73ce849741f0c40b4f801c3f765058349781
>> 2713e240e1ac2b7%40%253Cdev.kafka.apache.org%253E&amp;data=02%7C01%7Cse
>> nthilm%40microsoft.com%7C686c32fa4a554d61ae1408d756d409f6%7C72f988bf86
>> f141af91ab2d7cd011db47%7C1%7C0%7C637073341017520406&amp;sdata=8cKQcAm2
>> DDVGVLTKtciYKGMiI%2FgOADW6tam9nem4lsg%3D&amp;reserved=0
>>
>> Appreciate your timely action.
>>
>> PS: Initiating a separate thread as I was not able to reply to the 
>> existing threads...
>>
>> Thanks,
>> Senthil
>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to