blackflash997997 opened a new issue, #7385: URL: https://github.com/apache/paimon/issues/7385
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version My two paimon tables are both primary key tables with a fixed bucket size of 16. I'm using Spark SQL to execute a join statement on these two tables. One table has 80 million records, and the other has 150 million records. Using my SQL left join, writing to the non-primary key table (without a bucket key) is the fastest. However, if I write to a table with a fixed bucket key or dynamic bucketing, it becomes more than twice as slow. How can I eliminate the requirement for the primary key table to use fixed or dynamic bucketing? Otherwise, it severely impacts computational performance. ### Compute Engine spark 3.5.2 ### Minimal reproduce step table A and B have same pk and same fixed bucket number, table C same fixed bucket number, insert into c select * from A left join B on a.id=b.id and a.id1=b.id2 ### What doesn't meet your expectations? i can setting table c use no bucket mode to improve performance ### Anything else? _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
