[ 
https://issues.apache.org/jira/browse/HUDI-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247769#comment-17247769
 ] 

Vinoth Chandar commented on HUDI-944:
-------------------------------------

cc [~nishith29] here. he is taking over HUDI-845 as well.

 

[~windpiger] yes, we don't do conflict check and we don't have to per se for 
MOR tables at-least. Hudi's design models everything as a log unlike others who 
treat them as snapshots that are updated. So we can simply log new 
updates/deletes and let the merging logic handle. So we are able to support 
much higher concurrency.

 

That said, for COW, these checks can help provide some kind of serializability 
of writes. But note that even in systems that do this like Delta, unless you 
have an atomic rename or something, the checks are just best-effort. 

> Support more complete  concurrency control when writing data
> ------------------------------------------------------------
>
>                 Key: HUDI-944
>                 URL: https://issues.apache.org/jira/browse/HUDI-944
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: liwei
>            Assignee: liwei
>            Priority: Major
>             Fix For: 0.7.0
>
>
> Now hudi just support write、compaction concurrency control. But some scenario 
> need write concurrency control.Such as two spark job with different data 
> source ,need to write to the same hudi table.
> I have two Proposal:
> 1. first step :support write concurrency control on different partition
>  but now when two client write data to different partition, will meet these 
> error
> a、Rolling back commits failed
> b、instants version already exist
> {code:java}
>  [2020-05-25 21:20:34,732] INFO Checking for file exists 
> ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight 
> (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
>  Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
> Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
>  at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
>  at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
>  at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
>  at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  {code}
> c、two client's archiving conflict
> d、the read client meets "Unable to infer schema for Parquet. It must be 
> specified manually.;"
> 2. second step:support insert、upsert、compaction concurrency control on 
> different isolation level such as Serializable、WriteSerializable.
> hudi can design a mechanism to check the confict in 
> AbstractHoodieWriteClient.commit()
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to