prashantwason opened a new pull request #2216:
URL: https://github.com/apache/hudi/pull/2216
## What is the purpose of the pull request
Add a check to ensure there is no data loss when updating HUDI dataset
## Brief change log
- Added a new HoodieWriteConfig setting to enable data loss checks
- Added a new stat to HoodieWriteStat which tracks the number of records
written to older version of the data file
- When data loss check is enabled:
- Before HoodieMergeHandle is closed, it reads the last version of the
data file (if present) and find the number of records written in that and
compares against the current number of records written
## Verify this pull request
This change added tests and can be verified as follows:
Added a unit test to
./hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestHoodieClientOnCopyOnWriteStorage.java
Can be verified by running that unit test as follows:
mvn test -pl hudi-client/hudi-spark-client
-Dtest=TestHoodieClientOnCopyOnWriteStorage
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]