Re: [I] [Bug] two fixed bucket pk table join to fixed bucket table cause low performance [paimon]

via GitHub Tue, 10 Mar 2026 18:58:50 -0700


blackflash997997 commented on issue #7385:
URL: https://github.com/apache/paimon/issues/7385#issuecomment-4035647442


   > Hi, I would like to work on reproducing and investigating this issue.
   > 
   > I will try to reproduce the performance problem with fixed bucket primary 
key tables and analyze the Spark execution plan. If I can reproduce it, I will 
propose a fix.
   > 
   > Any pointers to the relevant modules (bucket assignment / join write path) 
would be appreciated.
   
   
   
   
   
   my test join sql like:
   ```sql
   insert into some_table 
   select bseg.*,bkpf.*
   from big_table_A  bseg
   left join big_table_B  bkpf
   on bseg.bukrs = bkpf.bukrs
   and bseg.belnr = bkpf.belnr
   and bseg.gjahr = bkpf.gjahr
   ```
   
   big_table_A:
        80  million records
        16 fixed bucket 
        primary-key:bukrs,belnr,gjahr,buzei
        total columns :104
   
   big_table_A:
        150 million records
        16 fixed bucket
        primary-key:bukrs,belnr,gjahr
        total columns :100
   
   
   i start spark_sql with these conf :
   `spark-sql  --num-executors 6 --executor-cores 2 --executor-memory 6g --conf 
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  --conf spark.sql.catalog.paimon_catalog=org.apache.paimon.spark.SparkCatalog  
--conf 
spark.sql.catalog.paimon_catalog.warehouse=seaweedfs://swfs:8888/tmp/test/lakehouse/
  --conf 
spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
 --conf spark.task.maxFailures=10  --conf 
spark.sql.sources.v2.bucketing.enabled=true`
   
   ### test\_scene 1
   some\_table:
   w/o pk , 'bucket'\=-1
   spark split whole sql into 2 jobs :
   
   - 1.scan table duration: 3.5min
   
   - 2.calculate and write duratio : 5.6 min
   
     - Output Size / Records: 3.1 GiB / 85031216
     - Shuffle Read Size / Records: 19.0 GiB / 233714595
   
   ### test\_scene 2
   some\_table:
   pk table
   'bucket'\=16
   primary-key:bukrs,belnr,gjahr,buzei
   spark split whole sql into 3 jobs :
   
   - 1.scan table duration: 3.7 min
   - 2.shuffle duration : 2.9 min
   
     - Shuffle Read Size / Records: 19.0 GiB / 233714595
     - Shuffle Write Size / Records: 19.9 GiB / 85031216
   - 3.shuffle calculate and write duration : 10 min
   
     - Output Size / Records: 3.5 GiB / 85031216
     - Shuffle Read Size / Records: 19.9 GiB / 850312166
   
   ### test_scene 3
   some_table:
        pk table
        'bucket'=-1
        primary-key:bukrs,belnr,gjahr,buzei
        also spark split whole sql into 3 jobs : more slower!
   
   ### direct write
   i also test sql as following:
   insert into table_B select * from table_A
   
   table_A : 16 fixed bucket ,bucket-key='col1,col2,col3'
   table_B :
   
   - scene 1 :w/o pk , 'bucket'=-1,
     Go through 2 spark jobs:
   
     - "scan"
     - "shuffle read -> output"
   - scene 2 :w/o pk , 'bucket'=16, bucket-key='col1'
     Go through 3 spark jobs:
   
     - "scan"
     - "shuffle read -> shuffle write"
     - "shuffle read -> output"
   - scene 3 :w/o pk , 'bucket'=16, bucket-key='col1,col2,col3'
   
     - "scan"
     - "shuffle read -> shuffle write"
     - "shuffle read -> output"
   - scene 4 :pk     , 'bucket'=16, bucket-key='col1'
   
     - "scan"
     - "shuffle read -> shuffle write"
     - "shuffle read -> output"
   
   ### conclusion
   As can be seen above,from above two test sql, writing data from a bucketed 
source table to a bucketed destination table, regardless of whether the 
bucketing strategy is fixed or dynamic, will be slower than writing data to a 
non-bucketed table due to shuffling.
   
   Furthermore, the more data there is, the slower it becomes; with small 
amounts of data, the difference is not noticeable.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] two fixed bucket pk table join to fixed bucket table cause low performance [paimon]

Reply via email to