Look forward to hearing from you. Best, Jingsong
On Thu, Feb 26, 2026 at 2:37 PM Mike Dias <[email protected]> wrote: > > Oh, that is perfect! Thank you! > > I think we should be good to add this feature then. We are currently testing > this patch internally and once we are happy with it, we can submit it as a PR > to the main repository, if that is okay with you. > > On Thu, Feb 26, 2026 at 5:33 PM Jingsong Li <[email protected]> wrote: >> >> Hi Mike, >> >> For the second scenario, here is an option: >> 'commit.strict-mode.last-safe-snapshot'. If you are using >> RescaleAction, it will set this option to check this scenario. >> >> Best, >> Jingsong >> >> On Thu, Feb 26, 2026 at 2:27 PM Mike Dias <[email protected]> wrote: >> > >> > Thanks, Jingsong! >> > >> > It seems we already check the number of buckets being equal when >> > committing here -> >> > https://github.com/apache/paimon/blob/e1eeec56954c19ed78fd0bd4a46e0a332443397d/paimon-core/src/main/java/org/apache/paimon/operation/commit/ConflictDetection.java#L219. >> > >> > I think that should capture the first scenario where: >> > >> > writer starts >> > rescale starts >> > rescale commits >> > writer commits -> fails because the number of buckets changed >> > >> > I don't think it would address the second scenario where: >> > >> > rescale starts >> > writer starts >> > writes commits >> > rescale commits -> previous commit is overwritten >> > >> > Is my understanding correct? Not sure if it is possible to detect the >> > second scenario, though... users will need to ensure that no writer is >> > running/started duing the rescaling process. >> > >> > >> > On Thu, Feb 26, 2026 at 3:24 PM Jingsong Li <[email protected]> wrote: >> >> >> >> Hi Mike, >> >> >> >> This is a good question. >> >> >> >> As far as I know, Paimon does not strictly check that all partitions >> >> must have the same number of buckets. It is possible to achieve >> >> different buckets for different partitions, but it is more complex. We >> >> may need to scan the manifests when writing to ensure that the number >> >> of buckets written to the partitions is the same as before, otherwise >> >> it will cause inconsistent data correctness issues. >> >> >> >> Best, >> >> Jingsong >> >> >> >> On Mon, Feb 16, 2026 at 1:19 PM Mike Dias via dev <[email protected]> >> >> wrote: >> >> > >> >> > Hi Paimon maintainers, >> >> > >> >> > I'm looking to implement a change that would allow different partitions >> >> > within a PK fixed-bucket table to have different bucket counts, >> >> > primarily >> >> > to support highly skewed partitions with more/fewer buckets. >> >> > >> >> > We would use dynamic buckets to handle skew, but we really need multiple >> >> > writers writing to the same active partitions in both streaming and >> >> > batch, >> >> > which doesn't seem to be something we could easily support with dynamic >> >> > buckets without coordinating changes to the bucket index file... >> >> > >> >> > On the fixed-buckets side, though, it seems we are in a good spot to >> >> > implement per-partition bucketing, and this rescale doc >> >> > <https://paimon.apache.org/docs/1.3/maintenance/rescale-bucket/> >> >> > suggests >> >> > we can already do that for partitions that aren't receiving writes. >> >> > Unfortunately, our partitions are not time-based, and most of them are >> >> > always receiving writes... >> >> > >> >> > Hence, we would need to adapt the current code to allow writers to look >> >> > up >> >> > the bucket counts from the manifest partition rather than relying on the >> >> > global table bucket count. >> >> > >> >> > That brings me to the following questions: >> >> > >> >> > 1. *Can we actually do this?:* Are there architectural reasons why >> >> > bucket counts must be uniform across all partitions? Are there >> >> > assumptions >> >> > elsewhere in the codebase that depend on a single global bucket >> >> > count? >> >> > 2. *Concurrent writers:* If multiple writers are active, they each >> >> > independently load the partition bucket mapping at initialization, >> >> > which >> >> > creates a risk of inconsistency if a rescale operation completes >> >> > between >> >> > when different writers load their mappings. This is not too >> >> > different from >> >> > the existing behavior, but with a global bucket count, it is much >> >> > easier to >> >> > safeguard against it. Do you have ideas on how we could mitigate >> >> > this issue >> >> > or warn users against this pitfall? >> >> > 3. *Read path:* On the read side, does the scan/split logic already >> >> > handle partitions with heterogeneous bucket counts, or would changes >> >> > be >> >> > needed there as well? >> >> > >> >> > >> >> > Any guidance on gotchas or prior art in this area would be greatly >> >> > appreciated. Happy to share the full diff or open a draft PR if that >> >> > would >> >> > be easier to review. >> >> > >> >> > -- >> >> > Thanks, >> >> > Mike Dias >> > >> > >> > >> > -- >> > Thanks, >> > Mike Dias > > > > -- > Thanks, > Mike Dias
