Re: [DISCUSS] KIP-1335: Bounded concurrency for partition reassignment via kafka-reassign-partitions.sh

Manan Gupta Sun, 31 May 2026 23:50:38 -0700

Hey Luke
Thank you for reviewing the proposal.

LC1:
Please excuse me if my explanation of the two different modes was unclear.


In non-incremental mode the tool walks the plan in steps. Each step submits
up to N partition reassignments, then waits until every partition in that
step has finished before it opens the next step. The slowest partition in
the current step holds up the entire next step.

In incremental mode N is not “how big each step is.” It is how many
partition reassignments from this plan may be active at the same time. The
tool keeps refilling up to N: whenever any single partition completes, it
can start the next one from the queue. There is no rule that the whole
group of N must finish together before new work starts.

Example: 10 partitions in sorted order P1 through P10, N equals 3.

Non-incremental: Step one submits P1 P2 P3 and waits until all three are
done. Step two submits P4 P5 P6 and waits until all three are done. Step
three submits P7 P8 P9 and waits until all three are done. Step four
submits P10 only. If P3 is slow, P4 cannot start until P3 finishes, even if
P1 and P2 are already done.

Incremental: The tool first submits P1 P2 P3 so three reasginemnts are
active. If P2 finishes first, it can submit P4 while P1 and P3 are still
running, still keeping three active when possible. It continues that way
until every partition in the plan has been submitted and the in-flight work
drains according to the tool semantics. If P3 is slow, P4 can still start
as soon as some other slot frees up.

How to choose: use non-incremental if you want clear steps and a strict
“this whole batch finished before the next batch starts” story. Use
incremental if you want steadier utilization when finish times differ and
you do not want one slow partition to block starting unrelated partitions
beyond the cap of N at once.

LC2:
Both these values are the same, I have updated the KIP to reflect that now.

Regards
Manan Gupta


On Mon, Jun 1, 2026 at 9:52 AM Luke Chen <[email protected]> wrote:

> Hi Manan,
>
> Thanks for the KIP.
> This is a good improvement.
>
> Questions:
> 1. After reading the KIP, I still don't understand the difference between
> "incremental mode" and "non-incremental mode".
> From what I can see is that they both run with reassignment-batch-size once
> time.
> What's the difference between them?
> Could you explain more?
> Maybe some examples would be helpful to help users know the difference and
> how they choose them.
>
>
> 2. I see there are "INCREMENTAL_REASSIGNMENT_POLL_INTERVAL_MS" and
> "reassignment-poll-interval-ms".
> What's the difference between them?
>
>
> Thank you,
> Luke
>
>
> On Mon, May 25, 2026 at 11:06 PM Manan Gupta <[email protected]> wrote:
>
> > Hey TaiJuWu
> >
> > Thank you for reviewhing the KIP, my response is inline.
> >
> > > TJ00: If we have multiple batch requests, how do you handle single
> batch
> > failure?
> > - If a submit step fails, the tool returns immediately with errors and
> does
> > not enqueue the rest; partitions already submitted stay under the
> > controller’s reassignment as they do today.
> > - The process exits with a TerseException listing the failed partitions
> and
> > the error message from the broker/controller (the same pattern as a
> > single-shot execute when some alters fail).
> >
> > > TJ01: If there is a long time operation, how can the users know it
> still
> > running instead of hang?
> > - Controller / cluster side: ongoing reassignments and replication
> > (metrics, kafka-reassign-partitions --list, Admin / JMX).
> > - verify in another terminal shows progress toward the target.
> > Batch wait is mostly quiet; incremental is a bit chattier; true progress
> is
> > best observed from cluster state or --verify, not only from stdout during
> > the wait loop.
> >
> > Thanks,
> > Manan Gupta
> >
> > On Mon, May 25, 2026 at 6:06 PM TaiJu Wu <[email protected]> wrote:
> >
> > > Hi Manan,
> > >
> > > Thanks for the KIP, just for some question.
> > >
> > > TJ00: If we have multiple batch requests, how do you handle single
> batch
> > > failure?
> > >
> > > TJ01: If there is a long time operation, how can the users know it
> still
> > > running instead of hang?
> > >
> > > Thanks,
> > > TaiJuWu
> > >
> > >
> > >
> > > Manan Gupta <[email protected]> 於 2026年5月18日週一 下午6:09寫道：
> > >
> > > > Hey Kamal
> > > >
> > > > Thank you for your comments.
> > > >
> > > > > Should we have a configurable list poll interval?
> > > > The current fixed interval of 500ms should not degrade the controller
> > > but I
> > > > agree that operators should have an option to change this value,
> > updated
> > > > the KIP to also take another parameter reassignment-poll-interval-ms
> to
> > > > update the default value from 500 ms.
> > > >
> > > > > Shall we extend the batching logic to also kafka-leader-election
> > > script?
> > > > Good point, I will pick this up as a separate KIP as a followup to
> this
> > > > KIP.
> > > >
> > > > Thanks,
> > > > Manan
> > > >
> > > > On Mon, May 18, 2026 at 2:52 PM Kamal Chandraprakash <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi Manan,
> > > > >
> > > > > Thanks for improving the user-facing tools! Overall LGTM. Few
> > > questions:
> > > > >
> > > > > 1. Should we have a configurable list poll interval? With 500ms,
> does
> > > it
> > > > > poll the controller often to list the currently running
> reassignments
> > > for
> > > > > large partitions?
> > > > > 2. Shall we extend the batching logic to also kafka-leader-election
> > > > script?
> > > > > It will be useful when running with --all-topic-partitions.
> > > > >
> > > > > Thanks,
> > > > > Kamal
> > > > >
> > > > >
> > > > > On Mon, May 11, 2026 at 8:55 AM Manan Gupta <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hello
> > > > > >
> > > > > > Gentle reminder to review the KIP.
> > > > > >
> > > > > > Thanks,
> > > > > > Manan
> > > > > >
> > > > > > On Wed, May 6, 2026 at 7:52 PM Manan Gupta <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This email starts the discussion thread for *KIP-1335: Bounded
> > > > > > > concurrency for partition reassignment via
> > > > > kafka-reassign-partitions.sh*.
> > > > > > > The proposal adds optional reassignment-batch-size and
> > incremental
> > > > > > > parameters to kafka-reassign-partitions.sh so operators can cap
> > how
> > > > > many
> > > > > > > partition reassignments are submitted or kept in flight at once
> > > using
> > > > > > > existing Admin API,
> > > > > > >
> > > > > > > I will appreciate your initial thoughts and feedback on the
> > > proposal.
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/x/8ZAmGQ
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Manan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-1335: Bounded concurrency for partition reassignment via kafka-reassign-partitions.sh

Reply via email to