[
https://issues.apache.org/jira/browse/HUDI-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128222#comment-17128222
]
Vinoth Chandar commented on HUDI-944:
-------------------------------------
Hi [~309637554] please go ahead with HUDI-839 tests if that's a good change to
get started.. Also happy to finish it up. So let me know :)
On b, its actually exciting to see that we have some similar ideas again :)
> This scenes we also meet. In some database , use bucket or sharding to solve
> this problem. With bucket users need to first bucket there data with the
> key using hash partition algorithm(like kafka built in such algorithm), then
> different hudi client write the data with different key and will not conflict
> when concurrency writing data.
We need to introduce a set of bucketed logs to place the inserts and merge them
with the other base file groups.. Anyways, once you are ramped up, we can
continue this on a doc :)
HUDI-55, I feel is very different. its more supporting cases of point lookup
like queries (we can just leverage RFC-08 to do a much better job of this.).
> Support more complete concurrency control when writing data
> ------------------------------------------------------------
>
> Key: HUDI-944
> URL: https://issues.apache.org/jira/browse/HUDI-944
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: liwei
> Assignee: liwei
> Priority: Major
> Fix For: 0.6.0
>
>
> Now hudi just support write、compaction concurrency control. But some scenario
> need write concurrency control.Such as two spark job with different data
> source ,need to write to the same hudi table.
> I have two Proposal:
> 1. first step :support write concurrency control on different partition
> but now when two client write data to different partition, will meet these
> error
> a、Rolling back commits failed
> b、instants version already exist
> {code:java}
> [2020-05-25 21:20:34,732] INFO Checking for file exists
> ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight
> (org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
> Exception in thread "main" org.apache.hudi.exception.HoodieIOException:
> Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
> at
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
> at
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
> at
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
> at
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> {code}
> c、two client's archiving conflict
> d、the read client meets "Unable to infer schema for Parquet. It must be
> specified manually.;"
> 2. second step:support insert、upsert、compaction concurrency control on
> different isolation level such as Serializable、WriteSerializable.
> hudi can design a mechanism to check the confict in
> AbstractHoodieWriteClient.commit()
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)