[I] [Bug] two fixed bucket pk table join to fixed bucket table cause low performance [paimon]

via GitHub Mon, 09 Mar 2026 19:03:15 -0700


blackflash997997 opened a new issue, #7385:
URL: https://github.com/apache/paimon/issues/7385


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   My two paimon tables are both primary key tables with a fixed bucket size of 
16. I'm using Spark SQL to execute a join statement on these two tables. One 
table has 80 million records, and the other has 150 million records. Using my 
SQL left join, writing to the non-primary key table (without a bucket key) is 
the fastest. However, if I write to a table with a fixed bucket key or dynamic 
bucketing, it becomes more than twice as slow. How can I eliminate the 
requirement for the primary key table to use fixed or dynamic bucketing? 
Otherwise, it severely impacts computational performance.
   
   
   ### Compute Engine
   
   spark 3.5.2
   
   ### Minimal reproduce step
   
   table A and B have same pk and same fixed bucket number,
   table C same fixed bucket number,
   
   insert into c
   select * from A left join B on a.id=b.id and a.id1=b.id2
   
   ### What doesn't meet your expectations?
   
   i can setting table c use no bucket mode to improve performance
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] two fixed bucket pk table join to fixed bucket table cause low performance [paimon]

Reply via email to