shangxinli opened a new pull request, #13674:
URL: https://github.com/apache/hudi/pull/13674

   ### Change Logs
   This change introduces a new file stitching optimization for Hudi clustering 
that merges row groups based on schema compatibility using Parquet API. The 
implementation adds HoodieParquetStrictMerge for efficient file merging, 
LiteFileBinaryCopier for optimized file copying, and updates the 
PartitionAwareClusteringPlanStrategy to support row group-level merging. New 
configuration PARQUET_LITE_FILE_MERGER_ENABLE has been added to control this 
feature.
   
   ### Impact
   
     - New configuration: hoodie.storage.parquet.lite.file.merger.enable 
(default: false)
     - Enhanced PartitionAwareClusteringPlanStrategy with row group merging 
capabilities
   
   ### Risk level (medium)
   
    Verification done to mitigate risks:
     - Added unit tests in HoodieParquetStrictMergeTest and 
TestClusteringLiteFileMerger
     - Integration tests for partition-aware clustering strategy
     - Feature is disabled by default and requires explicit configuration
     - Maintains backward compatibility with existing clustering behavior
   
   ### Documentation Update
   
     Required updates:
     - Configuration documentation needs update for new 
hoodie.storage.parquet.lite.file.merger.enable config
     - Clustering strategy documentation should include information about row 
group merging optimization
     - Performance tuning guide should mention this optimization for large file 
scenarios
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to