[jira] [Comment Edited] (HUDI-8300) Expand NBCC to support other index types on Spark

Y Ethan Guo (Jira) Thu, 03 Oct 2024 18:01:15 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886814#comment-17886814
 ]


Y Ethan Guo edited comment on HUDI-8300 at 10/4/24 1:00 AM:
------------------------------------------------------------

At high level, what needs to be done is to make sure the partitioner handles 
bucketing properly, not doing small file handling by merging inserts, updates, 
and deletes with existing base files.  The writer in NBCC only generates log 
files for updates/deletes (in some cases, we can still do conflict check on new 
file slices).
 
I'm thinking that inserts may not be identified across concurrent writers for 
regular index in NBCC if we relax the condition to support NBCC with 
simple/bloom/RLI index; but that is also a limitation of OCC, so we can note 
such limitation down and users should be aware of the limitation.


was (Author: JIRAUSER280684):
At high level, what needs to be done is to make sure the partitioner handles 
bucketing properly, not doing small file handling by merging inserts, updates, 
and deletes with existing base files.  The writer in NBCC only generates log 
files for updates/deletes (in some cases, we can still do conflict check on new 
file slices).
 
I'm thinking that inserts may not be identified across concurrent writers for 
regular index in NBCC if we relax the condition to support NBCC with 
simple/bloom/RLI index; but that is also a limitation of OCC, so we can note 
such limitation down and users should be aware of the limitation.
 
 
 

> Expand NBCC to support other index types on Spark
> -------------------------------------------------
>
>                 Key: HUDI-8300
>                 URL: https://issues.apache.org/jira/browse/HUDI-8300
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Y Ethan Guo
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> Right now, NBCC is only supported with simple bucket index on MOR.  We can 
> consider relaxing this for use cases like ingestion with concurrent GDPR 
> deletes to be supported by NBCC, using simple/global/RLI index on Spark.
> {code:java}
> if (writeConcurrencyMode == 
> WriteConcurrencyMode.NON_BLOCKING_CONCURRENCY_CONTROL) {
>         checkArgument(
>             writeConfig.getTableType().equals(HoodieTableType.MERGE_ON_READ) 
> && writeConfig.isSimpleBucketIndex(),
>             "Non-blocking concurrency control requires the MOR table with 
> simple bucket index");
>       } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-8300) Expand NBCC to support other index types on Spark

Reply via email to