buiducsinh34 opened a new issue, #9806: URL: https://github.com/apache/hudi/issues/9806
**Describe the problem you faced** AWS Glue Sync fails when an overwriting action is done on a Hudi table with more than 25 partitions. Looks like AWSGlue has a constraint on the "BatchDeletePartition" request, specifically the value of "PartitionsToDelete" has to be no more than 25. Reference source: https://docs.aws.amazon.com/glue/latest/webapi/API_BatchDeletePartition.html#Glue-BatchDeletePartition-request-PartitionsToDelete **To Reproduce** Steps to reproduce the behavior: 1. Generate a Hudi table by bulk-insert with aws glue sync enabled, the number of partitions is 100. A glue table is created, namely "example_glue_table". 2. Re-generate the table by bulk-insert with updated data, glue sync enabled. **Expected behavior** Aws glue sync fails with the error message: ``` org.apache.hudi.com.amazonaws.services.glue.model.ValidationException: 1 validation error detected: Value '[PartitionValueList(values=[partition1]), PartitionValueList(values=[partition2]), PartitionValueList(values=[partition3]), ...(96 more) PartitionValueList(values=[partition100])]' at 'partitionsToDelete' failed to satisfy constraint: Member must have length less than or equal to 25 (Service: AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: 15e38477-6931-484c-ac6a-51f0c2cfe506; Proxy: null) ``` **Environment Description** * Hudi version : 0.13.1 * Spark version : 3.4.0 * Hive version : 3.1.3 * Hadoop version : 3.3.3 * AWS EMR version: 6.12.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** N/A **Stacktrace** ``` 23/09/29 05:02:11 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886) at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:826) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:322) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554) at [org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org](http://org.apache.spark.sql.catalyst.plans.logical.logicalplan.org/)$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) ... at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing example_glue_table at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:165) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59) ... 56 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table example_glue_table at org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:403) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:272) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:174) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:162) ... 57 more Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to drop partitions to example_glue_database.example_glue_table at org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.dropPartitions(AWSGlueCatalogSyncClient.java:222) at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:457) at org.apache.hudi.hive.HiveSyncTool.syncAllPartitions(HiveSyncTool.java:399) ... 60 more Caused by: org.apache.hudi.com.amazonaws.services.glue.model.ValidationException: 1 validation error detected: Value '[PartitionValueList(values=[partition1]), PartitionValueList(values=[partition2]), PartitionValueList(values=[partition3]), ...(96 more) PartitionValueList(values=[partition100])]' at 'partitionsToDelete' failed to satisfy constraint: Member must have length less than or equal to 25 (Service: AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: ...; Proxy: null) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:13784) at org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13751) at org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13740) at org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchDeletePartition(AWSGlueClient.java:406) at org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchDeletePartition(AWSGlueClient.java:375) at org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.dropPartitions(AWSGlueCatalogSyncClient.java:214) ... 62 more ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
