yihua opened a new pull request, #8001:
URL: https://github.com/apache/hudi/pull/8001
### Change Logs
Even though the metadata table writer used by the async indexer is
configured to use `LAZY` failed write cleaning policy, the
`SparkHoodieBackedTableMetadataWriter` is hard-coded to roll back failed writes
regardless of the configuration, which should not be triggered for the async
indexer. In the current logic, the async indexer can trigger the rollback of
inflight delta commit from another regular writer in the metadata table,
causing issues. This also makes the following test flaky.
This PR fixes `SparkHoodieBackedTableMetadataWriter` so that the rollback of
failed writes is not triggered by the async indexer.
```
2023-02-16T13:46:06.1573775Z [ERROR] Tests run: 113, Failures: 0, Errors: 1,
Skipped: 2, Time elapsed: 3,518.191 s <<< FAILURE! - in
org.apache.hudi.utilities.deltastreamer.TestHoodieDeltaStreamer
2023-02-16T13:46:06.1576031Z [ERROR] testHoodieIndexer{HoodieRecordType}[2]
Time elapsed: 79.838 s <<< ERROR!
...
2023-02-16T13:46:06.1705711Z Caused by: java.lang.IllegalArgumentException
2023-02-16T13:46:06.1706251Z at
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
2023-02-16T13:46:06.1706995Z at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:633)
2023-02-16T13:46:06.1707847Z at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:698)
2023-02-16T13:46:06.1708751Z at
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:147)
2023-02-16T13:46:06.1709792Z at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:172)
2023-02-16T13:46:06.1710733Z at
org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44)
2023-02-16T13:46:06.1712815Z at
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:111)
2023-02-16T13:46:06.1713593Z at
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:80)
2023-02-16T13:46:06.1714353Z at
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:154)
2023-02-16T13:46:06.1715155Z at
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:186)
...
```
### Impact
Fixes the rollback behavior of async indexer. Also fixes the flaky test.
Adds a new test to guard around the behavior (before this PR, the test fails).
### Risk level
low
### Documentation Update
N/A
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]