[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

via GitHub Tue, 11 Jul 2023 15:48:01 -0700


prashantwason commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1260358088



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -65,7 +66,23 @@ public class HoodieMetadataWriteUtils {
   public static HoodieWriteConfig createMetadataWriteConfig(
       HoodieWriteConfig writeConfig, HoodieFailedWritesCleaningPolicy 
failedWritesCleaningPolicy) {
     String tableName = writeConfig.getTableName() + METADATA_TABLE_NAME_SUFFIX;
-    int parallelism = writeConfig.getMetadataInsertParallelism();
+
+    // MDT writes are always prepped. Hence, insert and upsert shuffle 
parallelism are not important to be configured. Same for delete
+    // parallelism as deletes are not used.
+    // The finalize, cleaner and rollback tasks will operate on each fileGroup 
so their parallelism should be as large as the total file groups.
+    // But it's not possible to accurately get the file group count here so 
keeping these values large enough. This parallelism would
+    // any ways be limited by the executor counts.
+    final int defaultParallelism = 512;
+
+    // File groups in each partition are fixed at creation time and we do not 
want them to be split into multiple files
+    // ever. Hence, we use a very large basefile size in metadata table. The 
actual size of the HFiles created will
+    // eventually depend on the number of file groups selected for each 
partition (See estimateFileGroupCount function)
+    final long maxHFileSizeBytes = 10 * 1024 * 1024 * 1024L; // 10GB

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason commented on a diff in pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

Reply via email to