parisni commented on code in PR #9071:
URL: https://github.com/apache/hudi/pull/9071#discussion_r1254221958
##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -569,6 +587,53 @@ private static Table getTable(AWSGlue awsGlue, String
databaseName, String table
}
}
+ // TODO: make this faster with Glue Segment API
+ private static List<com.amazonaws.services.glue.model.Partition>
getAllGluePartitions(AWSGlue awsGlue,
+
String databaseName,
+
String tableName) {
+ try {
+ List<com.amazonaws.services.glue.model.Partition> partitions = new
ArrayList<>();
+ String nextToken = null;
+ do {
+ GetPartitionsResult result = awsGlue.getPartitions(new
GetPartitionsRequest()
+ .withDatabaseName(databaseName)
+ .withTableName(tableName)
+ .withNextToken(nextToken));
Review Comment:
set awsGlue client` .withExcludeColumnSchema(true)` to limit network
transfer ?
##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -355,6 +336,43 @@ public void updateTableSchema(String tableName,
MessageType newSchema) {
.withTableInput(updatedTableInput);
awsGlue.updateTable(request);
+
+ if (!table.getPartitionKeys().isEmpty() && cascade) {
Review Comment:
isn't `cascade` redondant w/ `table.getPartitionKeys().isEmpty()` ?
##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -330,7 +312,6 @@ && getTable(awsGlue, databaseName,
tableName).getPartitionKeys().equals(partitio
@Override
public void updateTableSchema(String tableName, MessageType newSchema) {
- // ToDo Cascade is set in Hive meta sync, but need to investigate how to
configure it for Glue meta
boolean cascade =
config.getSplitStrings(META_SYNC_PARTITION_FIELDS).size() > 0;
Review Comment:
I guess we should restrict cascade only when the issue occurs: when the
schema evolution targets new unordered struct fields. In the general case there
is no need to cascade
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]