Hi Chia-Ping,

KAFKA-15047 makes segments deletable when there is only one segment for tiered 
storage. 

If we disable the action, it may cause active segments that have reached their 
`retention.ms` to remain undeleted. When there is a large number of partitions, 
this issue becomes more pronounced since each partition will retain one active 
segment. 

This problem is especially significant in cloud environments, as it directly 
impacts billing costs.

Perhaps we could add a special validation logic for `retention.ms`?



Best, Lan







At 2025-08-08 18:10:19, "Chia-Ping Tsai" <chia7...@apache.org> wrote:
>hi Jun
>
>Negative value are allowed for `retention.ms`, so setting a lower bound break 
>the behavior of disabling retention. 
>
>If the lower bound is intended to avoid frequent segment rolling, we could 
>consider disabling the action: `retention.ms` would be ignored when there is 
>only one segment.
>
>Best,
>Chia-Ping
>
>On 2025/08/06 16:28:00 Jun Rao wrote:
>> Hi, Divij,
>> 
>> Another comment regarding the changes to segment.ms.
>> 
>> Currently, retention.ms has the following doc.
>> "Additionally, retention.ms configuration operates independently of "
>> segment.ms" and "segment.bytes" configurations. Moreover, it triggers the
>> rolling of new segment if the retention.ms condition is satisfied."
>> 
>> So, if we set a lower bound for segment.ms, should we do the same for
>> retention.ms too?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> On Thu, Dec 5, 2024 at 1:26 AM Divij Vaidya <divijvaidy...@gmail.com> wrote:
>> 
>> > Hi Ismael
>> >
>> > You are right. In hindsight, I should have started deprecation with one of
>> > the earlier 3.x versions.
>> >
>> > But since that ship has already sailed, we can either wait for 5.0 to make
>> > the actual change with just a deprecation notice in 4.x OR we can make the
>> > change starting in 4.0 itself.
>> >
>> > I think that the latter (changing in 4.0) is desirable because the changes
>> > are low risk. I will elaborate why I think so.
>> >
>> > The practice of having a deprecation period achieves two purposes:
>> > 1\ provide users with ample time to prepare for upgrades where code changes
>> > may be required.
>> > 2\ allow users to measure the impact of the changes prior to moving to 4.0.
>> > Testing on existing versions is important since in 4.0, it will be
>> > difficult to isolate the impact of these changes due to the presence of
>> > many other changes.
>> >
>> > Purpose 2 can still be achieved with the current proposal since all the
>> > proposed changes could be tested with existing Kafka versions. There are no
>> > new configurations which have been added in the proposal which cannot be
>> > tested with older versions.
>> >
>> > For point 1, with the current proposal, 99% of users will require no code
>> > change since we do not anticipate existing workloads to be using the now
>> > removed lower bounds for the configurations. Here's a breakdown of risk at
>> > a per-config level.
>> >
>> > 1. The changes in configuration to prevent small segments are expected to
>> > be compatible with 99% of production workloads. This is because Kafka will
>> > not be functional with extremely low segment size due to OS imposed limits
>> > on file descriptors and mmaps. Hence, the risk of breaking existing
>> > workloads by adding these new constraints is very low. As noted in the
>> > discussion, we may impact some testing scenarios but there are quick ways
>> > to work around them. We will also provide examples on how to write tests
>> > where you expect a quick segment roll.
>> > 2. The config changes which are associated with Tiered Storage (thread pool
>> > configs) were introduced in 3.9. The customers who use the now removed,
>> > "-1" value will have to make a change before upgrading to 4.0. I agree that
>> > this does cause user friction during upgrade. To reduce the chance of
>> > un-intentional error, we will add a "Pre-Upgrade Validation Tool".
>> > 3. The change associated with the num recovery thread is an internal change
>> > to the broker. I do not anticipate any negative side effects of changing
>> > this default. It also does not require any code change from the customer
>> > since no new constraint was added.
>> > 4. `linger.ms` change is potentially the one with the largest impact to
>> > existing production workloads. I would assume that users who
>> > intentionally set linger.ms = 0 are expert users (since this quite an
>> > unusual setting to get to behave correctly) and these users would pay
>> > attention to upgrade notes (similar to how we made ack=all as default in
>> > producers starting in 3.0). For users who un-intentionally set it to 0, I
>> > believe that the new setting will work better for their workloads.
>> >
>> > I will update the KIP with the above explanation.
>> >
>> > Looking forward to hearing your thoughts.
>> >
>> > --
>> > Divij Vaidya
>> >
>> >
>> >
>> > On Thu, Dec 5, 2024 at 4:01 AM Ismael Juma <m...@ismaeljuma.com> wrote:
>> >
>> > > Hi Divij,
>> > >
>> > > The KIP didn't state this, but the usual practice is to have a
>> > deprecation
>> > > period before we make incompatible changes. Why did we reject this
>> > option?
>> > > We should mention that explicitly in the KIP.
>> > >
>> > > Ismael
>> > >
>> > > On Tue, Nov 19, 2024, 2:55 AM Divij Vaidya <divijvaidy...@gmail.com>
>> > > wrote:
>> > >
>> > > > KT1 - That is right. We will throw a ConfigException. That is why this
>> > > > change is considered backward incompatible. To be honest, given the
>> > > nature
>> > > > of suggested changes, I don't see any valid use case on why a user may
>> > > have
>> > > > a value which will be invalid after the new constraints.
>> > > >
>> > > >
>> > > > --
>> > > > Divij Vaidya
>> > > >
>> > > >
>> > > >
>> > > > On Tue, Nov 19, 2024 at 2:21 AM Kirk True <k...@kirktrue.pro> wrote:
>> > > >
>> > > > > Hi Divij,
>> > > > >
>> > > > > Thanks for the KIP!
>> > > > >
>> > > > > My only question:
>> > > > >
>> > > > > KT1. In the case where we change the constraints so that a user's
>> > > > > previously valid configuration is now invalid, do we do anything
>> > other
>> > > > than
>> > > > > throw a ConfigException?
>> > > > >
>> > > > > Thanks,
>> > > > > Kirk
>> > > > >
>> > > > > On Mon, Nov 18, 2024, at 2:13 AM, Divij Vaidya wrote:
>> > > > > > Hey folks
>> > > > > >
>> > > > > > With 4.0, we have an opportunity to reset the default values and
>> > add
>> > > > > > constraints in the configurations based on our learnings since 3.0.
>> > > > > >
>> > > > > > Here's a KIP which modifies defaults for some properties and
>> > modifies
>> > > > the
>> > > > > > constraints for a few others.
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
>> > > > > >
>> > > > > >
>> > > > > > Looking forward for your feedback.
>> > > > > >
>> > > > > > (Previous discussion thread on this topic -
>> > > > > > https://lists.apache.org/thread/3dx9mdmsqf8pko9xdmhks80k96g650zp )
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> 

Reply via email to