danny0405 commented on code in PR #18411:
URL: https://github.com/apache/hudi/pull/18411#discussion_r3006985663


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/SevenToEightUpgradeHandler.java:
##########
@@ -249,6 +249,12 @@ static void upgradeKeyGeneratorType(HoodieTableConfig 
tableConfig, Map<ConfigPro
     }
   }
 
+  // Use a large batch size for migration to minimize the number of parquet 
files created
+  // on remote storage. Each write() call involves multiple remote storage 
operations (exists check,
+  // parquet write, manifest update). Using the default archival batch size 
(10) with hundreds of
+  // actions creates excessive I/O that significantly increases the total 
migration time.
+  private static final int MIGRATION_BATCH_SIZE = 500;

Review Comment:
   wondering how much gains for this large batch, like 30% cost saving?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to