[
https://issues.apache.org/jira/browse/HUDI-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886814#comment-17886814
]
Y Ethan Guo edited comment on HUDI-8300 at 10/4/24 1:00 AM:
------------------------------------------------------------
At high level, what needs to be done is to make sure the partitioner handles
bucketing properly, not doing small file handling by merging inserts, updates,
and deletes with existing base files. The writer in NBCC only generates log
files for updates/deletes (in some cases, we can still do conflict check on new
file slices).
I'm thinking that inserts may not be identified across concurrent writers for
regular index in NBCC if we relax the condition to support NBCC with
simple/bloom/RLI index; but that is also a limitation of OCC, so we can note
such limitation down and users should be aware of the limitation.
was (Author: JIRAUSER280684):
At high level, what needs to be done is to make sure the partitioner handles
bucketing properly, not doing small file handling by merging inserts, updates,
and deletes with existing base files. The writer in NBCC only generates log
files for updates/deletes (in some cases, we can still do conflict check on new
file slices).
I'm thinking that inserts may not be identified across concurrent writers for
regular index in NBCC if we relax the condition to support NBCC with
simple/bloom/RLI index; but that is also a limitation of OCC, so we can note
such limitation down and users should be aware of the limitation.
> Expand NBCC to support other index types on Spark
> -------------------------------------------------
>
> Key: HUDI-8300
> URL: https://issues.apache.org/jira/browse/HUDI-8300
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Y Ethan Guo
> Priority: Critical
> Fix For: 1.0.0
>
>
> Right now, NBCC is only supported with simple bucket index on MOR. We can
> consider relaxing this for use cases like ingestion with concurrent GDPR
> deletes to be supported by NBCC, using simple/global/RLI index on Spark.
> {code:java}
> if (writeConcurrencyMode ==
> WriteConcurrencyMode.NON_BLOCKING_CONCURRENCY_CONTROL) {
> checkArgument(
> writeConfig.getTableType().equals(HoodieTableType.MERGE_ON_READ)
> && writeConfig.isSimpleBucketIndex(),
> "Non-blocking concurrency control requires the MOR table with
> simple bucket index");
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)