davehagman opened a new pull request #3824:
URL: https://github.com/apache/hudi/pull/3824
## What is the purpose of the pull request
In order to support multiple writers against a given table we need to
eliminate (or at least greatly reduce) the chance that multiple writers running
in parallel will create instants using the same instant ID (it is currently
based on the current time down to second granularity).
The current strategy has a shockingly high rate of instant time collisions
even with just two writers. In order to make collisions far less likely this PR
adds millisecond granularity to the instant timestamp. It is worth noting that
this does not guarantee a collision-free solution so we also include steps to
reconcile state in case it does happen.
## Brief change log
*(for example:)*
- *Modified HoodieActiveTimeline.COMMIT_FORMATTER to add milliseconds*
## Verify this pull request
* Ran and verified existing unit tests around timeline operations
* Manually tested multiple writers which created hundreds of collision-free
timeline entries
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]