Thanks, Jingsong! It seems we already check the number of buckets being equal when committing here -> https://github.com/apache/paimon/blob/e1eeec56954c19ed78fd0bd4a46e0a332443397d/paimon-core/src/main/java/org/apache/paimon/operation/commit/ConflictDetection.java#L219 .
I think that should capture the first scenario where: - writer starts - rescale starts - rescale commits - writer commits -> fails because the number of buckets changed I don't think it would address the second scenario where: - rescale starts - writer starts - writes commits - rescale commits -> previous commit is overwritten Is my understanding correct? Not sure if it is possible to detect the second scenario, though... users will need to ensure that no writer is running/started duing the rescaling process. On Thu, Feb 26, 2026 at 3:24 PM Jingsong Li <[email protected]> wrote: > Hi Mike, > > This is a good question. > > As far as I know, Paimon does not strictly check that all partitions > must have the same number of buckets. It is possible to achieve > different buckets for different partitions, but it is more complex. We > may need to scan the manifests when writing to ensure that the number > of buckets written to the partitions is the same as before, otherwise > it will cause inconsistent data correctness issues. > > Best, > Jingsong > > On Mon, Feb 16, 2026 at 1:19 PM Mike Dias via dev <[email protected]> > wrote: > > > > Hi Paimon maintainers, > > > > I'm looking to implement a change that would allow different partitions > > within a PK fixed-bucket table to have different bucket counts, primarily > > to support highly skewed partitions with more/fewer buckets. > > > > We would use dynamic buckets to handle skew, but we really need multiple > > writers writing to the same active partitions in both streaming and > batch, > > which doesn't seem to be something we could easily support with dynamic > > buckets without coordinating changes to the bucket index file... > > > > On the fixed-buckets side, though, it seems we are in a good spot to > > implement per-partition bucketing, and this rescale doc > > <https://paimon.apache.org/docs/1.3/maintenance/rescale-bucket/> > suggests > > we can already do that for partitions that aren't receiving writes. > > Unfortunately, our partitions are not time-based, and most of them are > > always receiving writes... > > > > Hence, we would need to adapt the current code to allow writers to look > up > > the bucket counts from the manifest partition rather than relying on the > > global table bucket count. > > > > That brings me to the following questions: > > > > 1. *Can we actually do this?:* Are there architectural reasons why > > bucket counts must be uniform across all partitions? Are there > assumptions > > elsewhere in the codebase that depend on a single global bucket count? > > 2. *Concurrent writers:* If multiple writers are active, they each > > independently load the partition bucket mapping at initialization, > which > > creates a risk of inconsistency if a rescale operation completes > between > > when different writers load their mappings. This is not too different > from > > the existing behavior, but with a global bucket count, it is much > easier to > > safeguard against it. Do you have ideas on how we could mitigate this > issue > > or warn users against this pitfall? > > 3. *Read path:* On the read side, does the scan/split logic already > > handle partitions with heterogeneous bucket counts, or would changes > be > > needed there as well? > > > > > > Any guidance on gotchas or prior art in this area would be greatly > > appreciated. Happy to share the full diff or open a draft PR if that > would > > be easier to review. > > > > -- > > Thanks, > > Mike Dias > -- Thanks, Mike Dias
