VitoMakarevich commented on code in PR #10460:
URL: https://github.com/apache/hudi/pull/10460#discussion_r1524878225
##########
hudi-aws/src/main/java/org/apache/hudi/config/GlueCatalogSyncClientConfig.java:
##########
@@ -40,6 +42,28 @@ public class GlueCatalogSyncClientConfig extends
HoodieConfig {
.sinceVersion("0.14.0")
.withDocumentation("Glue catalog sync based client will skip archiving
the table version if this config is set to true");
+ public static final ConfigProperty<Integer> ALL_PARTITIONS_READ_PARALLELISM
= ConfigProperty
+ .key(GLUE_CLIENT_PROPERTY_PREFIX + "all_partitions_read_parallelism")
+ .defaultValue(1)
+ .markAdvanced()
+ .withValidValues(IntStream.rangeClosed(1,
10).mapToObj(Integer::toString).toArray(String[]::new))
+ .sinceVersion("1.0.0")
+ .withDocumentation("Parallelism for listing all partitions(first time
sync). Should be in interval [1, 10].");
+
+ public static final ConfigProperty<Integer>
CHANGED_PARTITIONS_READ_PARALLELISM = ConfigProperty
+ .key(GLUE_CLIENT_PROPERTY_PREFIX + "changed_partitions_read_parallelism")
+ .defaultValue(1)
+ .markAdvanced()
+ .sinceVersion("1.0.0")
+ .withDocumentation("Parallelism for listing changed partitions(second
and subsequent syncs).");
Review Comment:
Yeah, because ALL_PARTITIONS_READ_PARALLELISM is 1-10 and uses
[GetPartition](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html#aws-glue-api-catalog-partitions-GetPartition)
- used for initial load and allows to split N initial partitions to up to 10
segments and fetch them independently(basically same as without segments via
continuationToken).
While CHANGED_PARTITIONS_READ_PARALLELISM uses
[BatchGetPartitions](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html#aws-glue-api-catalog-partitions-BatchGetPartition)
- and the trick is that we can specify partitions we need(we know all from
commit file) - while here, in theory, parallelism can be very high, likely user
would like to limit it to a certain number to not face many retries. Basically
- 1 request is 1000 partitions, it's highly unlikely someone is operating at
very big scale, but still.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]