parisni commented on code in PR #10460:
URL: https://github.com/apache/hudi/pull/10460#discussion_r1526787383
##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -141,105 +156,196 @@ public AWSGlueCatalogSyncClient(HiveSyncConfig config) {
this.databaseName = config.getStringOrDefault(META_SYNC_DATABASE_NAME);
this.skipTableArchive =
config.getBooleanOrDefault(GlueCatalogSyncClientConfig.GLUE_SKIP_TABLE_ARCHIVE);
this.enableMetadataTable =
Boolean.toString(config.getBoolean(GLUE_METADATA_FILE_LISTING)).toUpperCase();
+ this.allPartitionsReadParallelism =
config.getIntOrDefault(ALL_PARTITIONS_READ_PARALLELISM);
+ this.changedPartitionsReadParallelism =
config.getIntOrDefault(CHANGED_PARTITIONS_READ_PARALLELISM);
+ this.changeParallelism =
config.getIntOrDefault(PARTITION_CHANGE_PARALLELISM);
+ }
+
+ private List<Partition> getPartitionsSegment(Segment segment, String
tableName) {
+ try {
+ List<Partition> partitions = new ArrayList<>();
+ String nextToken = null;
+ do {
+ GetPartitionsResponse result =
awsGlue.getPartitions(GetPartitionsRequest.builder()
+ .databaseName(databaseName)
+ .tableName(tableName)
+ .segment(segment)
Review Comment:
excluding column schema will lower the resources needed to parse the schema.
It can be very high for million of partitions / thousand of columns
```suggestion
.segment(segment)
.excludeColumnSchema(true)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]