[
https://issues.apache.org/jira/browse/HUDI-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395187#comment-17395187
]
ASF GitHub Bot commented on HUDI-2286:
--------------------------------------
prashantwason opened a new pull request #3428:
URL: https://github.com/apache/hudi/pull/3428
## What is the purpose of the pull request
A failed deltacommit on the metadata table will be automatically rolled
back. Assuming the failed commit was "t10", the rollback will happen the next
time at "t11". Post rollback, when we try to sync the dataset to the metadata
table, we should look for all unsynched instants including t11. Current code
ignores t11 since the latest commit timestamp on metadata table is t11 (due to
rollback).
## Brief change log
## Verify this pull request
Unit test has been added
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Handle the case of failed deltacommit on the metadata table.
> ------------------------------------------------------------
>
> Key: HUDI-2286
> URL: https://issues.apache.org/jira/browse/HUDI-2286
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Major
> Fix For: 0.9.0
>
>
> Assume the current timeline state is as follows:
> Dataset: C10
> Metadata Table: DC10
>
> Next ingestion run attempts a commit which succeeds. But syncing to metadata
> table fails. This is the new timeline state:
> Dataset: C10 C11
> Metadata Table: DC10 DC11.inflight
>
> Next ingestion run attempts to sync metadata table in preWrite(). This will
> do the following:
> 1. Get list of instants to sync (C11)
> 2. MetadataTable.startCommitWithTime(C11)
> 3. autoRollback DC11.inflight
> At this time the timelines will be:
> Dataset: C10. C11
> Metadata Table: DC10. Rollback12
> 4. startCommitWithTime fails here with the following exception:
> 21/07/17 15:59:43 ERROR client.AbstractHoodieWriteClient: Cannot start a new
> commit at time 20210717141448 as there are future commits present:
> [[==>20210717141448__deltacommit__REQUESTED]]
>
> Next ingestion run attempts a commit which succeeds. This is the new timeline
> state just after the commit has finished:
> Dataset: C10 C11 C13
> Metadata Table: DC10. Rollback12
> Metadata table sync will be called. It will pickup all instants to sync after
> the last timestamp on the metadata table - C13 (last timestamp on metadata =
> 12). Sync will succeed leading to this final timeline state:
> Dataset: C10. C11 C13
> Metadata Table: DC10. Rollback12 DC13
> So C11 never got committed to the metadata table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)