prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260358088
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
public static HoodieWriteConfig createMetadataWriteConfig(
HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy
failedWritesCleaningPolicy) {
String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
- int parallelism = writeConfig.getMetadataInsertParallelism();
+
+ // MDT writes are always prepped. Hence, insert and upsert shuffle
parallelism are not important to be configured. Same for delete
+ // parallelism as deletes are not used.
+ // The finalize, cleaner and rollback tasks will operate on each fileGroup
so their parallelism should be as large as the total file groups.
+ // But it's not possible to accurately get the file group count here so
keeping these values large enough. This parallelism would
+ // any ways be limited by the executor counts.
+ final int defaultParallelism = 512;
+
+ // File groups in each partition are fixed at creation time and we do not
want them to be split into multiple files
+ // ever. Hence, we use a very large basefile size in metadata table. The
actual size of the HFiles created will
+ // eventually depend on the number of file groups selected for each
partition (See estimateFileGroupCount function)
+ final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB
Review Comment:
Done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]