buiducsinh34 opened a new issue, #9806:
URL: https://github.com/apache/hudi/issues/9806

   **Describe the problem you faced**
   
   AWS Glue Sync fails when an overwriting action is done on a Hudi table with 
more than 25 partitions.
   Looks like AWSGlue has a constraint on the "BatchDeletePartition" request, 
specifically the value of "PartitionsToDelete" has to be no more than 25.
   Reference source: 
https://docs.aws.amazon.com/glue/latest/webapi/API_BatchDeletePartition.html#Glue-BatchDeletePartition-request-PartitionsToDelete
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Generate a Hudi table by bulk-insert with aws glue sync enabled, the 
number of partitions is 100. A glue table is created, namely 
"example_glue_table".
   2. Re-generate the table by bulk-insert with updated data, glue sync enabled.
   
   **Expected behavior**
   
   Aws glue sync fails with the error message:
   ```
   org.apache.hudi.com.amazonaws.services.glue.model.ValidationException: 1 
validation error detected: Value '[PartitionValueList(values=[partition1]), 
PartitionValueList(values=[partition2]), 
PartitionValueList(values=[partition3]), ...(96 more) 
PartitionValueList(values=[partition100])]' at 'partitionsToDelete' failed to 
satisfy constraint: Member must have length less than or equal to 25 (Service: 
AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: 
15e38477-6931-484c-ac6a-51f0c2cfe506; Proxy: null)
   ```
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : 3.4.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * AWS EMR version: 6.12.0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   N/A
   
   **Stacktrace**
   
   ```
   23/09/29 05:02:11 INFO ApplicationMaster: Unregistering ApplicationMaster 
with FAILED (diag message: User class threw exception: 
org.apache.hudi.exception.HoodieException: Could not sync using the meta sync 
class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
       at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888)
       at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:826)
       at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:322)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150)
       at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
       at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
       at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
       at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
       at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160)
       at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
       at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160)
       at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271)
       at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159)
       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
       at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
       at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554)
       at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554)
       at 
[org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org](http://org.apache.spark.sql.catalyst.plans.logical.logicalplan.org/)$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
       at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
       at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
       at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530)
       at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84)
       at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82)
       at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
       at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
       at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387)
       at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
       ...
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760)
   Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing example_glue_table
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:165)
       at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
       ... 56 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table example_glue_table
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:403)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272)
       at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:174)
       at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:162)
       ... 57 more
   Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to drop 
partitions to example_glue_database.example_glue_table
       at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.dropPartitions(AWSGlueCatalogSyncClient.java:222)
       at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:457)
       at 
org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:399)
       ... 60 more
   Caused by: 
org.apache.hudi.com.amazonaws.services.glue.model.ValidationException: 1 
validation error detected: Value '[PartitionValueList(values=[partition1]), 
PartitionValueList(values=[partition2]), 
PartitionValueList(values=[partition3]), ...(96 more) 
PartitionValueList(values=[partition100])]' at 'partitionsToDelete' failed to 
satisfy constraint: Member must have length less than or equal to 25 (Service: 
AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: ...; 
Proxy: null)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
       at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
       at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:13784)
       at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13751)
       at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13740)
       at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchDeletePartition(AWSGlueClient.java:406)
       at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchDeletePartition(AWSGlueClient.java:375)
       at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.dropPartitions(AWSGlueCatalogSyncClient.java:214)
       ... 62 more
   )
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to