Re: [PR] [HUDI-7466] Add parallel listing of existing partitions in Glue Catalog sync [hudi]

via GitHub Fri, 15 Mar 2024 13:32:51 -0700


parisni commented on code in PR #10460:
URL: https://github.com/apache/hudi/pull/10460#discussion_r1526787383



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -141,105 +156,196 @@ public AWSGlueCatalogSyncClient(HiveSyncConfig config) {
     this.databaseName = config.getStringOrDefault(META_SYNC_DATABASE_NAME);
     this.skipTableArchive = 
config.getBooleanOrDefault(GlueCatalogSyncClientConfig.GLUE_SKIP_TABLE_ARCHIVE);
     this.enableMetadataTable = 
Boolean.toString(config.getBoolean(GLUE_METADATA_FILE_LISTING)).toUpperCase();
+    this.allPartitionsReadParallelism = 
config.getIntOrDefault(ALL_PARTITIONS_READ_PARALLELISM);
+    this.changedPartitionsReadParallelism = 
config.getIntOrDefault(CHANGED_PARTITIONS_READ_PARALLELISM);
+    this.changeParallelism = 
config.getIntOrDefault(PARTITION_CHANGE_PARALLELISM);
+  }
+
+  private List<Partition> getPartitionsSegment(Segment segment, String 
tableName) {
+    try {
+      List<Partition> partitions = new ArrayList<>();
+      String nextToken = null;
+      do {
+        GetPartitionsResponse result = 
awsGlue.getPartitions(GetPartitionsRequest.builder()
+            .databaseName(databaseName)
+            .tableName(tableName)
+            .segment(segment)

Review Comment:
   excluding column schema will lower the resources needed to parse the schema. 
It can be very high for million of partitions / thousand of columns
   ```suggestion
               .segment(segment)
               .excludeColumnSchema(true)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7466] Add parallel listing of existing partitions in Glue Catalog sync [hudi]

Reply via email to