parisni opened a new pull request, #815:
URL: https://github.com/apache/incubator-xtable/pull/815
Improve Hudi sync performance when Hudi metadata column stats are not
enabled and XTable falls back to parquet footer reads on S3.
## Brief change log
- Parallelized computeColumnStatsFromParquetFooters in
HudiFileStatsExtractor using Stream.parallel().
- Added regression test
columnStatsWithoutMetadataTable_parallelFooterReadsAreThreadSafe to validate
stability under parallel execution.
- Added runtime tuning note for ForkJoinPool parallelism:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=16.
## Impact
- In our XTable runs (fallback-to-S3 footer path), this improved
processing time by about 4x.
## Verify this pull request
This change added tests and can be verified as follows:
- Run:
- mvn -pl xtable-core -Dtest=TestHudiFileStatsExtractor test -DskipITs
- Optional tuned run:
-
MAVEN_OPTS="-Djava.util.concurrent.ForkJoinPool.common.parallelism=16" mvn -pl
xtable-core -Dtest=TestHudiFileStatsExtractor test -DskipITs
- Result:
- TestHudiFileStatsExtractor passed (3 tests, 0 failures).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]