[GitHub] [hudi] parisni commented on a diff in pull request #9071: [HUDI-6453] Cascade Glue schema changes to partitions

via GitHub Thu, 06 Jul 2023 04:56:57 -0700


parisni commented on code in PR #9071:
URL: https://github.com/apache/hudi/pull/9071#discussion_r1254221958



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -569,6 +587,53 @@ private static Table getTable(AWSGlue awsGlue, String 
databaseName, String table
     }
   }
 
+  // TODO: make this faster with Glue Segment API
+  private static List<com.amazonaws.services.glue.model.Partition> 
getAllGluePartitions(AWSGlue awsGlue,
+                                                                               
         String databaseName,
+                                                                               
         String tableName) {
+    try {
+      List<com.amazonaws.services.glue.model.Partition> partitions = new 
ArrayList<>();
+      String nextToken = null;
+      do {
+        GetPartitionsResult result = awsGlue.getPartitions(new 
GetPartitionsRequest()
+            .withDatabaseName(databaseName)
+            .withTableName(tableName)
+            .withNextToken(nextToken));

Review Comment:
   set awsGlue client` .withExcludeColumnSchema(true)` to limit network 
transfer ?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -355,6 +336,43 @@ public void updateTableSchema(String tableName, 
MessageType newSchema) {
           .withTableInput(updatedTableInput);
 
       awsGlue.updateTable(request);
+
+      if (!table.getPartitionKeys().isEmpty() && cascade) {

Review Comment:
   isn't `cascade` redondant w/ `table.getPartitionKeys().isEmpty()` ?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -330,7 +312,6 @@ && getTable(awsGlue, databaseName, 
tableName).getPartitionKeys().equals(partitio
 
   @Override
   public void updateTableSchema(String tableName, MessageType newSchema) {
-    // ToDo Cascade is set in Hive meta sync, but need to investigate how to 
configure it for Glue meta
     boolean cascade = 
config.getSplitStrings(META_SYNC_PARTITION_FIELDS).size() > 0;

Review Comment:
   I guess we should restrict cascade only when the issue occurs: when the 
schema evolution targets new unordered struct fields. In the general case there 
is no need to cascade 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] parisni commented on a diff in pull request #9071: [HUDI-6453] Cascade Glue schema changes to partitions

Reply via email to