[I] [Feature] Spark Merge optimization : reading only key columns from target table [paimon]

via GitHub Tue, 25 Nov 2025 00:39:16 -0800


VasilyMelnik opened a new issue, #6669:
URL: https://github.com/apache/paimon/issues/6669


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   We merge 1 million rows from source into target with 10 million rows:
   ```
   MERGE INTO TableIcebergMOR target
     USING TableIcebergMOR_1000000 source
     ON target.id = source.id
     WHEN MATCHED THEN
     UPDATE SET *
     WHEN NOT MATCHED
     THEN INSERT *
   ```
   In physical plan we see, that target table is full scaned and shuffled :
   <img width="522" height="591" alt="Image" 
src="https://github.com/user-attachments/assets/7887a38f-32c4-4edc-bc43-fa968c9ef8e6";
 />
   Same query in Apache Iceberg scans only needed columns, so scan and shuffle 
**is times faster**:
   
   <img width="428" height="361" alt="Image" 
src="https://github.com/user-attachments/assets/1c9772a1-7475-45cf-9e5d-72e052555203";
 />
   
   ### Solution
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature] Spark Merge optimization : reading only key columns from target table [paimon]

Reply via email to