[GitHub] [hudi] gaoshihang opened a new issue, #7503: Failed to read schema while doing delete_partition operation

GitBox Sun, 18 Dec 2022 18:52:36 -0800


gaoshihang opened a new issue, #7503:
URL: https://github.com/apache/hudi/issues/7503


   **Describe the problem you faced**
   When I do a delete_partition operation, during the sync-hive phase, this 
Exception will be thrown:
   **And only happens when the files to be deleted are the last committed data 
files.**
   ```
   User class threw exception: org.apache.hudi.exception.HoodieException: Got 
runtime exception when hive syncing pgao_test_delete_partition_1
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:118)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:539)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:595)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:591)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:591)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:665)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:286)
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
   at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
   at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
   at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:767)
   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
   at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:287)
   at 
tv.freewheel.reporting.river.spark.engine.RiverSparkBatchEngineContext.standardSinkingToHudi(RiverSparkBatchEngineContext.java:188)
   at 
tv.freewheel.reporting.river.common.engine.RiverEngineContext.sinkToHudi(RiverEngineContext.java:131)
   at tv.freewheel.reporting.river.core.RiverMain.main(RiverMain.java:42)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)
   Caused by: org.apache.hudi.sync.common.HoodieSyncException: Failed to read 
data schema
   at 
org.apache.hudi.sync.common.AbstractSyncHoodieClient.getDataSchema(AbstractSyncHoodieClient.java:158)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:171)
   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:129)
   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:115)
   ... 37 more
   Caused by: java.lang.IllegalArgumentException: Failed to read schema from 
data file 
s3a://lake-house-hudi/pgao_test_delete_partition_1/1849/liveramp/523319/dd689c6344/e4043ee1-c8b8-4e5c-8788-b86d35d437d0-0_13-25-11859_20221206065637234.parquet.
 File does not exist.
   at 
org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromBaseFile(TableSchemaResolver.java:451)
   at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:95)
   at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:209)
   at 
org.apache.hudi.sync.common.AbstractSyncHoodieClient.getDataSchema(AbstractSyncHoodieClient.java:155)
   ... 40 more
   ```
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Upsert a dataframe into Hudi
   2.Do delete_partition on this same dataframe
   
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.0.3
   
   * Hive version : 2.3.7
   
   * Hadoop version : 3.3.0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] gaoshihang opened a new issue, #7503: Failed to read schema while doing delete_partition operation

Reply via email to