Prashant Wason created HUDI-1649:
------------------------------------
Summary: Serious production issues with Metadata Table in 0.7
release
Key: HUDI-1649
URL: https://issues.apache.org/jira/browse/HUDI-1649
Project: Apache Hudi
Issue Type: Sub-task
Reporter: Prashant Wason
Assignee: Prashant Wason
We have discovered the following issues while using the Metadata Table code in
production:
*Issue 1: Automatic rollbacks during commit get a timestamp which is out of
order*
Suppose commit C1 failed. The next commit will try to rollback C1
automatically. This will create the following two instances C2.commit and
R3.rollback. Hence, the rollback will have a timestamp > the commit which
occurs after it.
This is because of how the code is implemented in
AbstractHoodieWriteClient.startCommitWithTime() where the timestamp of the next
commit is chosen before the timestamp of the rollback instant.
*Issue 2: Syncing of rollbacks is not working*
Due to the above HUDI issue, syncing of rollbacks in Metadata Table does not
work correctly.
Assume the timeline as follows:
Dataset Timeline: C1 C2. C3
Metadata Timeline: DC1 DC2. (dc=delta-commit)
Suppose the next commit C4 fails. When C5 is attempted, C4 will be
automatically tolled back. Due to the issue #1, the timelines will become as
follows:
Dataset Timeline: C1 C2. C3. C5 R6
Metadata Timeline: DC1 DC2
Now if the Metadata Table is synced (AbstractHoodieWriteClient.postCommit), the
code will end up processing C5 first and then R6 which will mean that the file
rolled back in R6 will be committed to the metadata table as deleted files.
There is logic within HoodieTableMetadataUtils.processRollbackMetadata() to
ignore R6 in this scenario but it does not work because of the issue #1.
*Issue #3: Rollback instants are deleted inline*
Current rollback code deleted older instants inline. The delete logic keeps
oldest ten instants (hardcoded) and removes all more-recent rollback instants.
Furthermore, the deletion ONLY deletes the rollback.complete and does not
remove the corresponding rollback.inflight files.
Hence, will many rollbacks the following timeline is possible
Timeline: C1. C2 C3 C4. R5.inflight C5 C6 C7 ...
(there are 9 previous rollback instants to R5).
*Issue #4: Metadata Table reader does not show correct view of the metadata*
Assume the timeline is as in Issue #3 with a leftover rollback.inflight
instant. Also assume that the metadata table is synced only till C4. The
MetadataTableWriter will not sync any more instants to the Metadata Table since
an incomplete instant is present next.
The same sync logic is also used by the MetadataReader to perform the in-memory
merge of timeline. Hence, the reader will also not consider C5, C6 and C7
thereby providing an incorrect and older view of the FileSlices and FileGroups.
Any future ingestion into this table MAY insert data into older versions of the
FileSlices which will end up being a data loss when queried.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)