[
https://issues.apache.org/jira/browse/HUDI-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241517#comment-17241517
]
Song Jun commented on HUDI-944:
-------------------------------
[~vinoth] currently hudi does not support conflict check when do commit, right?
for examle, two jobs read from the same hoodie snapshot and both modify the
same hoodiekey, and job1
commit first, then job2 commit should do some conflict check, and should be
failed(because job2 should re-read from the job1's commit instant and process
and re-commit new instant).
> Support more complete concurrency control when writing data
> ------------------------------------------------------------
>
> Key: HUDI-944
> URL: https://issues.apache.org/jira/browse/HUDI-944
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: liwei
> Assignee: liwei
> Priority: Major
> Fix For: 0.6.1
>
>
> Now hudi just support write、compaction concurrency control. But some scenario
> need write concurrency control.Such as two spark job with different data
> source ,need to write to the same hudi table.
> I have two Proposal:
> 1. first step :support write concurrency control on different partition
> but now when two client write data to different partition, will meet these
> error
> a、Rolling back commits failed
> b、instants version already exist
> {code:java}
> [2020-05-25 21:20:34,732] INFO Checking for file exists
> ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight
> (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
> Exception in thread "main" org.apache.hudi.exception.HoodieIOException:
> Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
> at
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
> at
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
> at
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> {code}
> c、two client's archiving conflict
> d、the read client meets "Unable to infer schema for Parquet. It must be
> specified manually.;"
> 2. second step:support insert、upsert、compaction concurrency control on
> different isolation level such as Serializable、WriteSerializable.
> hudi can design a mechanism to check the confict in
> AbstractHoodieWriteClient.commit()
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)