hudi-bot opened a new issue, #15772: URL: https://github.com/apache/hudi/issues/15772
Sometimes it fails because in the metadata table a rollback occurs and rolls back a commit but the deltastreamer tries to change the instance from requested to inflight. This fails because the requested file has been removed from the timeline Here is an example of a failing [test stack trace|https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=15021&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934&l=30526] {code:java} Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:633) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:698) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:147) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:172) at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:111) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:80) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:154) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:172) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:823) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:890) at org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$1(BaseHoodieWriteClient.java:355) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:355) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:282) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:233) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:102) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:61) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:199) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:713) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:395) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716) {code} ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-5733 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-4302 - Fix version(s): - 1.1.0 --- ## Comments 10/Feb/23 17:20;shivnarayan;Recently we added eager rollbacks to MDT. Hence a delta commit started by HoodieIndexer will be rolledback by regular writer process. Similarly, a HoodieIndexer delta commit when started, could rollback a delta commit in MDT from regular ingestion writer. ;;; --- 10/Feb/23 17:23;shivnarayan;Two viable options I see: # Make LogRecordReader to work w/ multi-writer wrt rollback blocks. then we can make MDT to lazy clean policy for rollbacks. Neat approach, but needs good testing since this is touching regular log blog reads (for any MOR table). # Little more invovled and not so clean fix. Apply eager rollbacks only for regular delta commits. Deduce delta commits from HoodieIndexer and employ lazy clean policy(based on heartbeat). ;;; --- 10/Feb/23 18:23;guoyihua;We'll take the second approach for fixing 0.13.0 and follow up with the first approach to properly fix the problem.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
