[
https://issues.apache.org/jira/browse/HUDI-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430048#comment-17430048
]
Dave Hagman edited comment on HUDI-2559 at 10/18/21, 2:40 PM:
--------------------------------------------------------------
I have been extensively testing approach #2 and so far it has worked very well.
I still need to fully test all the table services to ensure it doesn't break
anything.
With the experience I have behind approach 2, I don't see any reason why
approach 1 would not work however I do have a caution. Approach 2 guarantees
that two writers will never create conflicting commits (provided the user
ensures all writers have a unique ID) while approach 1 does not (it just make
it very unlikely). I have no worries about approach #1 but if we were to roll
forward with that, we would need to provide guidance around what to do if a
commit collision does occur. From my testing, simply restarting the failed
writer usually worked fine but we would need much more testing and verification
around this in order to ensure zero consistency/corruption issues (especially
with the new metadata table functionality).
was (Author: dave_hagman):
I have been extensively testing approach #2 and so far it has worked very well.
I still need to fully test all the table services to ensure it doesn't break
anything.
With the experience I have behind approach 2, I don't see any reason why
approach 1 would not work however I do have a caution. Approach 2 guarantees
that two writers will never create conflicting commits while approach 1 does
not (it just make it very unlikely). I have no worries about approach #1 but if
we were to roll forward with that, we would need to provide guidance around
what to do if a commit collision does occur. From my testing, simply restarting
the failed writer usually worked fine but we would need much more testing and
verification around this in order to ensure zero consistency/corruption issues
(especially with the new metadata table functionality).
> Ensure unique timestamps are generated for commit times with concurrent
> writers
> -------------------------------------------------------------------------------
>
> Key: HUDI-2559
> URL: https://issues.apache.org/jira/browse/HUDI-2559
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
>
> Ensure unique timestamps are generated for commit times with concurrent
> writers.
> this is the piece of code in HoodieActiveTimeline which creates a new commit
> time.
> {code:java}
> public static String createNewInstantTime(long milliseconds) {
> return lastInstantTime.updateAndGet((oldVal) -> {
> String newCommitTime;
> do {
> newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new
> Date(System.currentTimeMillis() + milliseconds));
> } while (HoodieTimeline.compareTimestamps(newCommitTime,
> LESSER_THAN_OR_EQUALS, oldVal));
> return newCommitTime;
> });
> }
> {code}
> There are chances that a deltastreamer and a concurrent spark ds writer gets
> same timestamp and one of them fails.
> Related issues and github jiras:
> [https://github.com/apache/hudi/issues/3782]
> https://issues.apache.org/jira/browse/HUDI-2549
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)