Re: Feasibility of per-partition instead of per-table bucket count

Jingsong Li Wed, 25 Feb 2026 22:33:23 -0800

Hi Mike,

For the second scenario, here is an option:
'commit.strict-mode.last-safe-snapshot'. If you are using
RescaleAction, it will set this option to check this scenario.


Best,
Jingsong

On Thu, Feb 26, 2026 at 2:27 PM Mike Dias <[email protected]> wrote:
>
> Thanks, Jingsong!
>
> It seems we already check the number of buckets being equal when committing 
> here -> 
> https://github.com/apache/paimon/blob/e1eeec56954c19ed78fd0bd4a46e0a332443397d/paimon-core/src/main/java/org/apache/paimon/operation/commit/ConflictDetection.java#L219.
>
> I think that should capture the first scenario where:
>
> writer starts
> rescale starts
> rescale commits
> writer commits -> fails because the number of buckets changed
>
> I don't think it would address the second scenario where:
>
> rescale starts
> writer starts
> writes commits
> rescale commits -> previous commit is overwritten
>
> Is my understanding correct? Not sure if it is possible to detect the second 
> scenario, though... users will need to ensure that no writer is 
> running/started duing the rescaling process.
>
>
> On Thu, Feb 26, 2026 at 3:24 PM Jingsong Li <[email protected]> wrote:
>>
>> Hi Mike,
>>
>> This is a good question.
>>
>> As far as I know, Paimon does not strictly check that all partitions
>> must have the same number of buckets. It is possible to achieve
>> different buckets for different partitions, but it is more complex. We
>> may need to scan the manifests when writing to ensure that the number
>> of buckets written to the partitions is the same as before, otherwise
>> it will cause inconsistent data correctness issues.
>>
>> Best,
>> Jingsong
>>
>> On Mon, Feb 16, 2026 at 1:19 PM Mike Dias via dev <[email protected]> 
>> wrote:
>> >
>> > Hi Paimon maintainers,
>> >
>> > I'm looking to implement a change that would allow different partitions
>> > within a PK fixed-bucket table to have different bucket counts, primarily
>> > to support highly skewed partitions with more/fewer buckets.
>> >
>> > We would use dynamic buckets to handle skew, but we really need multiple
>> > writers writing to the same active partitions in both streaming and batch,
>> > which doesn't seem to be something we could easily support with dynamic
>> > buckets without coordinating changes to the bucket index file...
>> >
>> > On the fixed-buckets side, though, it seems we are in a good spot to
>> > implement per-partition bucketing, and this rescale doc
>> > <https://paimon.apache.org/docs/1.3/maintenance/rescale-bucket/> suggests
>> > we can already do that for partitions that aren't receiving writes.
>> > Unfortunately, our partitions are not time-based, and most of them are
>> > always receiving writes...
>> >
>> > Hence, we would need to adapt the current code to allow writers to look up
>> > the bucket counts from the manifest partition rather than relying on the
>> > global table bucket count.
>> >
>> > That brings me to the following questions:
>> >
>> >    1. *Can we actually do this?:* Are there architectural reasons why
>> >    bucket counts must be uniform across all partitions? Are there 
>> > assumptions
>> >    elsewhere in the codebase that depend on a single global bucket count?
>> >    2. *Concurrent writers:* If multiple writers are active, they each
>> >    independently load the partition bucket mapping at initialization, which
>> >    creates a risk of inconsistency if a rescale operation completes between
>> >    when different writers load their mappings. This is not too different 
>> > from
>> >    the existing behavior, but with a global bucket count, it is much 
>> > easier to
>> >    safeguard against it. Do you have ideas on how we could mitigate this 
>> > issue
>> >    or warn users against this pitfall?
>> >    3. *Read path:* On the read side, does the scan/split logic already
>> >    handle partitions with heterogeneous bucket counts, or would changes be
>> >    needed there as well?
>> >
>> >
>> > Any guidance on gotchas or prior art in this area would be greatly
>> > appreciated. Happy to share the full diff or open a draft PR if that would
>> > be easier to review.
>> >
>> > --
>> > Thanks,
>> > Mike Dias
>
>
>
> --
> Thanks,
> Mike Dias

Re: Feasibility of per-partition instead of per-table bucket count

Reply via email to