parisni opened a new pull request, #815:
URL: https://github.com/apache/incubator-xtable/pull/815

     Improve Hudi sync performance when Hudi metadata column stats are not 
enabled and XTable falls back to parquet footer reads on S3.
   
     ## Brief change log
   
     - Parallelized computeColumnStatsFromParquetFooters in 
HudiFileStatsExtractor using Stream.parallel().
     - Added regression test 
columnStatsWithoutMetadataTable_parallelFooterReadsAreThreadSafe to validate 
stability under parallel execution.
     - Added runtime tuning note for ForkJoinPool parallelism:
       -Djava.util.concurrent.ForkJoinPool.common.parallelism=16.
   
     ## Impact
   
     - In our XTable runs (fallback-to-S3 footer path), this improved 
processing time by about 4x.
   
     ## Verify this pull request
   
     This change added tests and can be verified as follows:
   
     - Run:
         - mvn -pl xtable-core -Dtest=TestHudiFileStatsExtractor test -DskipITs
     - Optional tuned run:
         - 
MAVEN_OPTS="-Djava.util.concurrent.ForkJoinPool.common.parallelism=16" mvn -pl 
xtable-core -Dtest=TestHudiFileStatsExtractor test -DskipITs
     - Result:
         - TestHudiFileStatsExtractor passed (3 tests, 0 failures).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to