zuyanton opened a new issue #1764:
URL: https://github.com/apache/hudi/issues/1764
**Describe the problem you faced**
We are running MoR table on EMR+Hudi+S3 with
```hoodie.consistency.check.enabled```set to true with compaction set to be
executed inline.We update table every ten minutes with new data. We see
following issue (actually two issues) .
First issue is that compaction fails from time to time with exception
```HoodieCommitException: Failed to complete commit 20200624012710 due to
finalize errors. caused by HoodieIOException: Consistency check failed to
ensure all files APPEAR.``` Looks like Hudi tries to clean up duplicate data
files created due to spark retries but consistency check fails because files
are not there. This error does not appear when we disable consistency check by
setting up hoodie.consistency.check.enabled to false, cause Hudi proceeds with
attempt to delete non existing duplicate files and wraps up commit
successfully, however since we use S3, having consistency check disabled is not
ideal. First issue more often happens on bigger tables (>400gb) then small ones
(<100gbs)
Second issue is that after First issue happens, Hudi never changes commit
status and it stays INFLIGHT forever, which causes several other issues, like
log files with the same fileID as parquet files that were part of failed
compaction never get compacted, Hudi start ignoring cleaning settings and stops
removing all the commits that happen after failed commit. Although second
issue in our case is caused by first issue, its still doen't seem right to
leave compaction in INFLIGHT status after failure.
**To Reproduce**
Create a MoR table with ~100 partitions saved to S3, run updates for a while
with consistency check enabled and compaction set to be run inline. eventually
one of the compaction jobs should fail and compaction commit should stay in
INFLIGHT status
**Environment Description**
* Hudi version : 0.5.3
* Spark version :2.4.4
* Hive version :
* Hadoop version :2.8.5
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
Hudi settings that we use:
"hoodie.consistency.check.enabled"->"true",
"hoodie.compact.inline.max.delta.commits"->"12",
"hoodie.compact.inline"->"true",
"hoodie.clean.automatic"->"true",
"hoodie.cleaner.commits.retained"->"2",
DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
**Stacktrace**
```
20/06/24 01:38:05 INFO HoodieTable: Removing duplicate data files created
due to spark retries before committing.
Paths=[s3://bucketName/tableName/30/5bb5c4d5-a54a-4682-93d1-98ef3222d887-1_0-30-9408_20200624012710.parquet]
20/06/24 01:42:22 ERROR ApplicationMaster: User class threw exception:
org.apache.hudi.exception.HoodieCommitException: Failed to complete commit
20200624012710 due to finalize errors.
org.apache.hudi.exception.HoodieCommitException: Failed to complete commit
20200624012710 due to finalize errors.
at
org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:204)
at
org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1129)
at
org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1089)
at
org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1072)
at
org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1043)
at
org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$12(HoodieWriteClient.java:1158)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
at
org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1155)
at
org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:502)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
at
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at
com.amazon.fdl.components.compaction.job.CompactionHudiJob2$.main(CompactionHudiJob2.scala:147)
at
com.amazon.fdl.components.compaction.job.CompactionHudiJob2.main(CompactionHudiJob2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check
failed to ensure all files APPEAR
at
org.apache.hudi.table.HoodieTable.waitForAllFiles(HoodieTable.java:431)
at
org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:379)
at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:315)
at
org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:319)
at
org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:195)
... 42 more```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]