Prashant Wason created HUDI-2286:
------------------------------------

             Summary: Handle the case of failed deltacommit on the metadata 
table.
                 Key: HUDI-2286
                 URL: https://issues.apache.org/jira/browse/HUDI-2286
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Prashant Wason
            Assignee: Prashant Wason
             Fix For: 0.9.0


Assume the current timeline state is as follows:
Dataset: C10
Metadata Table: DC10

 

Next ingestion run attempts a commit which succeeds. But syncing to metadata 
table fails. This is the new timeline state:
Dataset: C10  C11
Metadata Table: DC10   DC11.inflight

 

Next ingestion run attempts to sync metadata table in preWrite(). This will do 
the following:
1. Get list of instants to sync (C11)
2. MetadataTable.startCommitWithTime(C11)
3. autoRollback DC11.inflight

                 At this time the timelines will be:
                    Dataset: C10. C11
                    Metadata Table: DC10. Rollback12

4. startCommitWithTime fails here with the following exception:
21/07/17 15:59:43 ERROR client.AbstractHoodieWriteClient: Cannot start a new 
commit at time 20210717141448 as there are future commits present: 
[[==>20210717141448__deltacommit__REQUESTED]]

 

Next ingestion run attempts a commit which succeeds. This is the new timeline 
state just after the commit has finished:
         Dataset: C10  C11 C13
          Metadata Table: DC10. Rollback12

Metadata table sync will be called. It will pickup all instants to sync after 
the last timestamp on the metadata table - C13 (last timestamp on metadata = 
12). Sync will succeed leading to this final timeline state:
Dataset: C10. C11 C13
Metadata Table: DC10. Rollback12 DC13

So C11 never got committed to the metadata table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to