rmpifer opened a new pull request #1963:
URL: https://github.com/apache/hudi/pull/1963
## What is the purpose of the pull request
After fetching hbase index for a record, Hudi performs validation that the
commit timestamp stored in hbase for that record is a `commit` on the timeline.
This makes any record that is stored to hbase index during a `deltacommit`
considered an invalid index and treated as a new record. This causes the hbase
index to be updated every time which leads to records being able to be in
multiple partitions and even in different file groups within same partition.
## Brief change log
* Modify HbaseIndex.checkIfValidCommit to consider DELTA_COMMIT timestamp as
valid index timestamp
## Verify this pull request
This change added tests and can be verified as follows:
- Verified test failed on MOR table before change and succeeding now
- Manually verified by:
* Uploaded patched JAR to EMR 5.30.1 cluster
* Create MOR table w/ HBASE index
* Upsert record
* Upsert record with new partition
* Validate new partition was not created and existing partition
displayed update to record
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]