Prashant Wason created HUDI-2286:
------------------------------------
Summary: Handle the case of failed deltacommit on the metadata
table.
Key: HUDI-2286
URL: https://issues.apache.org/jira/browse/HUDI-2286
Project: Apache Hudi
Issue Type: Bug
Reporter: Prashant Wason
Assignee: Prashant Wason
Fix For: 0.9.0
Assume the current timeline state is as follows:
Dataset: C10
Metadata Table: DC10
Next ingestion run attempts a commit which succeeds. But syncing to metadata
table fails. This is the new timeline state:
Dataset: C10 C11
Metadata Table: DC10 DC11.inflight
Next ingestion run attempts to sync metadata table in preWrite(). This will do
the following:
1. Get list of instants to sync (C11)
2. MetadataTable.startCommitWithTime(C11)
3. autoRollback DC11.inflight
At this time the timelines will be:
Dataset: C10. C11
Metadata Table: DC10. Rollback12
4. startCommitWithTime fails here with the following exception:
21/07/17 15:59:43 ERROR client.AbstractHoodieWriteClient: Cannot start a new
commit at time 20210717141448 as there are future commits present:
[[==>20210717141448__deltacommit__REQUESTED]]
Next ingestion run attempts a commit which succeeds. This is the new timeline
state just after the commit has finished:
Dataset: C10 C11 C13
Metadata Table: DC10. Rollback12
Metadata table sync will be called. It will pickup all instants to sync after
the last timestamp on the metadata table - C13 (last timestamp on metadata =
12). Sync will succeed leading to this final timeline state:
Dataset: C10. C11 C13
Metadata Table: DC10. Rollback12 DC13
So C11 never got committed to the metadata table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)