danny0405 commented on code in PR #18411:
URL: https://github.com/apache/hudi/pull/18411#discussion_r3006985663
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java:
##########
@@ -249,6 +249,12 @@ static void upgradeKeyGeneratorType(HoodieTableConfig
tableConfig, Map<ConfigPro
}
}
+ // Use a large batch size for migration to minimize the number of parquet
files created
+ // on remote storage. Each write() call involves multiple remote storage
operations (exists check,
+ // parquet write, manifest update). Using the default archival batch size
(10) with hundreds of
+ // actions creates excessive I/O that significantly increases the total
migration time.
+ private static final int MIGRATION_BATCH_SIZE = 500;
Review Comment:
wondering how much gains for this large batch, like 30% cost saving?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]