[ 
https://issues.apache.org/jira/browse/HUDI-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430048#comment-17430048
 ] 

Dave Hagman edited comment on HUDI-2559 at 10/18/21, 2:40 PM:
--------------------------------------------------------------

I have been extensively testing approach #2 and so far it has worked very well. 
I still need to fully test all the table services to ensure it doesn't break 
anything.

With the experience I have behind approach 2, I don't see any reason why 
approach 1 would not work however I do have a caution. Approach 2 guarantees 
that two writers will never create conflicting commits (provided the user 
ensures all writers have a unique ID) while approach 1 does not (it just make 
it very unlikely). I have no worries about approach #1 but if we were to roll 
forward with that, we would need to provide guidance around what to do if a 
commit collision does occur. From my testing, simply restarting the failed 
writer usually worked fine but we would need much more testing and verification 
around this in order to ensure zero consistency/corruption issues (especially 
with the new metadata table functionality). 


was (Author: dave_hagman):
I have been extensively testing approach #2 and so far it has worked very well. 
I still need to fully test all the table services to ensure it doesn't break 
anything.

With the experience I have behind approach 2, I don't see any reason why 
approach 1 would not work however I do have a caution. Approach 2 guarantees 
that two writers will never create conflicting commits while approach 1 does 
not (it just make it very unlikely). I have no worries about approach #1 but if 
we were to roll forward with that, we would need to provide guidance around 
what to do if a commit collision does occur. From my testing, simply restarting 
the failed writer usually worked fine but we would need much more testing and 
verification around this in order to ensure zero consistency/corruption issues 
(especially with the new metadata table functionality). 

> Ensure unique timestamps are generated for commit times with concurrent 
> writers
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-2559
>                 URL: https://issues.apache.org/jira/browse/HUDI-2559
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>
> Ensure unique timestamps are generated for commit times with concurrent 
> writers.
> this is the piece of code in HoodieActiveTimeline which creates a new commit 
> time.
> {code:java}
> public static String createNewInstantTime(long milliseconds) {
>   return lastInstantTime.updateAndGet((oldVal) -> {
>     String newCommitTime;
>     do {
>       newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new 
> Date(System.currentTimeMillis() + milliseconds));
>     } while (HoodieTimeline.compareTimestamps(newCommitTime, 
> LESSER_THAN_OR_EQUALS, oldVal));
>     return newCommitTime;
>   });
> }
> {code}
> There are chances that a deltastreamer and a concurrent spark ds writer gets 
> same timestamp and one of them fails. 
> Related issues and github jiras: 
> [https://github.com/apache/hudi/issues/3782]
> https://issues.apache.org/jira/browse/HUDI-2549
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to