Hi Matthias,
How can this be done?
Kind Regards,
Luis
On Sunday, September 30, 2018, 9:10:01 PM GMT+2, Matthias J. Sax
<[email protected]> wrote:
Luis,
What is the status of this KIP?
I tend to agree, that introducing the feature only globally, might be
less useful (I would assume that people want to turn it on, on a
per-topic basis). As I am not familiar with the corresponding code, I
cannot judge the complexity to add topic level configs, however, it
seems to be worth to include it in the KIP.
-Matthias
On 9/21/18 1:59 PM, Bertus Greeff wrote:
> Someone pointed out to me that my scenario is also resolved by using Kafka
> transactions. Zombie fencing which is essentially my scenario was one of the
> scenarios that transactions were designed to solve. I was going to use the
> ideas of this KIP to solve it but using transactions seems even better
> because out of order messages never even make it into the topic. They are
> blocked by the broker.
>
> -----Original Message-----
> From: Guozhang Wang <[email protected]>
> Sent: Saturday, September 1, 2018 11:33 AM
> To: dev <[email protected]>
> Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction
>
> Hello Luis,
>
> Thanks for your thoughtful responses, here are my two cents:
>
> 1) I think having the new configs with per-topic granularity would not
> introduce much memory overhead or logic complexity, as all you need is to
> remember this at the topic metadata cache. If I've missed some details about
> the complexity, could you elaborate a bit more?
>
> 2) I agree with you: the current `ConfigDef.Validator` only scope on the
> validated config value itself, and hence cannot be dependent on another
> config.
>
> 4) I think Jun's point is that since we need the latest message in the log
> segment for the timestamp tracking, we cannot delete it actually: with offset
> based only policy, this is naturally guaranteed; but now with other policy,
> it is not guaranteed to never be compacted away, and hence we need to
> "enforce" to special-handle this case and not delete it.
>
>
>
> Guozhang
>
>
>
> Guozhang
>
>
> On Wed, Aug 29, 2018 at 9:25 AM, Luís Cabral <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Since there has been a rejuvenated interest in this KIP, it felt
>> better to downgrade it back down to [DISCUSSION], as we aren't really
>> voting on it anymore.
>> I'll try to address the currently pending questions on the following
>> points, so please bear with me while we go through them all:
>>
>> 1) Configuration: Topic vs Global
>>
>> Here we all agree that having a configuration per-topic is the best
>> option. However, this is not possible with the current design of the
>> compaction solution. Yes, it is true that "some" properties that
>> relate to compaction are configurable per-topic, but those are only
>> the properties that act outside(!) of the actual compaction logic,
>> such as configuring the start-compaction trigger with "ratio" or
>> compaction duration with "lag.ms ".
>> This logic can, of course, be re-designed to suit our wishes, but this
>> is not a direct effort, and if we have spent months arguing about the
>> extra 8 bytes per record, for sure we would spend another few dozen
>> months discussing the memory impact that doing this change to the
>> properties will invariably have.
>> As such, I will limit this KIP to ONLY have these configurations globally.
>>
>> 2) Configuration: Fail-fast vs Fallback
>>
>>
>> Ideally, I would also like to prevent the application to start if the
>> configuration is somehow invalid.
>> However (another 'however'), the way the configuration is handled
>> prevents adding dependencies between them, so we can't add logic that
>> says "configuration X is invalid if configuration Y is so-and-such".
>> Again, this can be re-designed to add this feature to the
>> configuration logic, but it would again be a big change just by
>> itself, so this KIP is again limited to use ONLY what is already in place.
>>
>> 3) Documenting the memory impact on the KIP
>>
>> This is now added to the KIP, though this topic is more complicated
>> than 'memory impact'. E.g.: this change doesn't translate to an actual
>> memory impact, it just means that the compaction will potentially
>> encompass less records per execution.
>>
>> 4) Documenting how we deal with the last message in the log
>>
>> I have 2 interpretations for this request: "the last message in the log"
>> or "the last message with a shared key on the log"
>> For the former: there is no change to the logic on how the last
>> message is handled. Only the "tail" gets compacted, so the "head"
>> (which includes the last message) still keeps the last message
>>
>> 5) Documenting how the key deletion will be handled
>>
>> I'm having some trouble understanding this one; do you mean how keys
>> are deleted in general, or?
>>
>> Cheers,
>> Luis Cabral
>>
>> On Friday, August 24, 2018, 1:54:54 AM GMT+2, Jun Rao
>> <[email protected]>
>> wrote:
>>
>> Hi, Luis,
>>
>> Thanks for the reply. A few more comments below.
>>
>> 1. About the topic level configuration. It seems that it's useful for
>> the new configs to be at the topic level. Currently, the following
>> configs related to compaction are already at the topic level.
>>
>> min.cleanable.dirty.ratio
>> min.compaction.lag.ms
>> cleanup.policy
>>
>> 2. Have you documented the memory impact in the KIP?
>>
>> 3. Could you document how we deal with the last message in the log,
>> which is potentially cleanable now?
>>
>> 4. Could you document how key deletion will be handled?
>>
>> 10. As for Jason's proposal on CompactionStrategy, it does make the
>> feature more general. On the other hand, it will be useful not to
>> require user-level code if the compaction value only comes from the header.
>>
>> 20. "If compaction.strategy.header is chosen and
>> compaction.strategy.header is not set, the KIP falls back to offset."
>> I am wondering if it's better to just fail the configuration in the case.
>>
>> Jun
>>
>>
>>
>> On Thu, Aug 16, 2018 at 1:33 PM, Guozhang Wang <[email protected]> wrote:
>>
>>> Regarding "broker-agnostic of headers": there are some KIPs from
>>> Streams
>> to
>>> use headers for internal purposes as well, e.g. KIP-258 and KIP-213
>>> (I admit there may be a conflict with user space, but practically I
>>> think it is very rare). So I think we are very likely going to make
>>> Kafka
>> internals
>>> to be "headers-aware" anyways.
>>>
>>> Regarding the general API: I think it is a good idea in general, but
>>> it
>> may
>>> still have limits: note that right now our KIP enforce a header type
>>> to
>> be
>>> long, and we have a very careful discussion about the fall-back
>>> policy if header does not have the specified key or if the value is
>>> not long-typed; but if we enforce long type version in the
>>> interface, it would require users trying to customizing their
>>> compaction logic (think: based on some value payload field) to
>>> transform their fields to long as well. So I feel we can still go
>>> with the current proposed approach, and only consider
>> this
>>> general API if we observe it does have a general usage requirement.
>>> By
>> that
>>> time we can still extend the config values of "log.cleaner.compaction.
>>> strategy" to "offset, timestamp, header, myFuncName".
>>>
>>> @Bertus
>>>
>>> Thanks for your feedback, I believe the proposed config is indeed
>>> for
>> both
>>> global (for the whole broker) and per-topic, Luís can confirm if
>>> this is the case, and update the wiki page to make it clear.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Thu, Aug 16, 2018 at 9:09 AM, Bertus Greeff <
>>> [email protected]> wrote:
>>>
>>>> I'm interested to know the status of this KIP. I see that the
>>>> status
>> is
>>>> "Voting". How long does this normally take?
>>>>
>>>> We want to use Kafka and this KIP provides exactly the log
>>>> compaction logic that we want for many of our projects.
>>>>
>>>> One piece of feedback that I have is that log.cleaner.compaction.
>>> strategy
>>>> and log.cleaner.compaction.strategy.header needs to be per topic.
>>>> The text of the KIP makes it sound that the config is only
>>>> available for
>> all
>>>> topics but this makes no sense. Different topics will need
>>>> different strategies and/or headers.
>>>>
>>>> From the KIP:
>>>> Provide the configuration for the individual topics None of the
>>>> configurations for log compaction are available at topic level, so
>>>> adding it there is not a part of this KIP
>>>>
>>>>
>>>>
>>>> On 2018/04/05 08:44:00, Luís Cabral <[email protected]> wrote:
>>>>> Hello all,>
>>>>> Starting a discussion for this feature.>
>>>>> KIP-280 :
>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>> Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-&data
>>>>> =02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4b0a243108d6103
>>>>> 959ed%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6367142358482
>>>>> 30609&sdata=gtulffYXEDbf%2BNt1b1P%2FkGeDQvmcf2PZ9nQlo4oi%2Bo
>>>>> Q%3D&reserved=0
>>> 280%
>>>> 3A+Enhanced+log+compactionPull-4822 :
>>>> 3A+Enhanced+log+https://na01.safelinks.protection.outlook.com/?url
>>>> 3A+Enhanced+log+=https%3A%2F%2Fgithub.com%2Fapache%2Fkafk&data
>>>> 3A+Enhanced+log+=02%7C01%7Cbgreeff%40microsoft.com%7C42ac1e21f70c4
>>>> 3A+Enhanced+log+b0a243108d6103959ed%7C72f988bf86f141af91ab2d7cd011
>>>> 3A+Enhanced+log+db47%7C1%7C0%7C636714235848230609&sdata=s5lASk
>>>> 3A+Enhanced+log+fvK%2F21wmK8vsbdlXIXINfFFWZCtXzBLWhdCiA%3D&res
>>>> 3A+Enhanced+log+erved=0
>>>> a/pull/4822>
>>>>
>>>>> Kind Regards,Luís>
>>>>
>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>
>
>
>
> --
> -- Guozhang
>