Prashant Wason created HUDI-1649:
------------------------------------

             Summary: Serious production issues with Metadata Table in 0.7 
release
                 Key: HUDI-1649
                 URL: https://issues.apache.org/jira/browse/HUDI-1649
             Project: Apache Hudi
          Issue Type: Sub-task
            Reporter: Prashant Wason
            Assignee: Prashant Wason


We have discovered the following issues while using the Metadata Table code in 
production:

 

*Issue 1: Automatic rollbacks during commit get a timestamp which is out of 
order*

Suppose commit C1 failed. The next commit will try to rollback C1 
automatically. This will create the following two instances C2.commit and 
R3.rollback. Hence, the rollback will have a timestamp > the commit which 
occurs after it. 

This is because of how the code is implemented in 
AbstractHoodieWriteClient.startCommitWithTime() where the timestamp of the next 
commit is chosen before the timestamp of the rollback instant.

 

*Issue 2: Syncing of rollbacks is not working*

Due to the above HUDI issue, syncing of rollbacks in Metadata Table does not 
work correctly. 

Assume the timeline as follows: 

Dataset Timeline: C1  C2. C3
Metadata Timeline: DC1 DC2.  (dc=delta-commit)

 

Suppose the next commit C4 fails. When C5 is attempted, C4 will be 
automatically tolled back. Due to the issue #1, the timelines will become as 
follows:

Dataset Timeline: C1  C2. C3.  C5  R6 
Metadata Timeline: DC1 DC2 

Now if the Metadata Table is synced (AbstractHoodieWriteClient.postCommit), the 
code will end up processing C5 first and then R6 which will mean that the file 
rolled back in R6 will be committed to the metadata table as deleted files. 
There is logic within HoodieTableMetadataUtils.processRollbackMetadata() to 
ignore R6 in this scenario but it does not work because of the issue #1.

  

*Issue #3: Rollback instants are deleted inline*

Current rollback code deleted older instants inline. The delete logic keeps 
oldest ten instants (hardcoded) and removes all more-recent rollback instants. 
Furthermore, the deletion ONLY deletes the rollback.complete and does not 
remove the corresponding rollback.inflight files. 

Hence, will many rollbacks the following timeline is possible

Timeline: C1. C2 C3 C4. R5.inflight C5 C6 C7 ...

(there are 9 previous rollback instants to R5).

 

*Issue #4: Metadata Table reader does not show correct view of the metadata*

Assume the timeline is as in Issue #3 with a leftover rollback.inflight 
instant. Also assume that the metadata table is synced only till C4. The 
MetadataTableWriter will not sync any more instants to the Metadata Table since 
an incomplete instant is present next.

The same sync logic is also used by the MetadataReader to perform the in-memory 
merge of timeline. Hence, the reader will also not consider C5, C6 and C7 
thereby providing an incorrect and older view of the FileSlices and FileGroups. 

 

Any future ingestion into this table MAY insert data into older versions of the 
FileSlices which will end up being a data loss when queried.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to