[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC

Jing Zhang (Jira) Thu, 19 Oct 2023 20:05:14 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jing Zhang updated HUDI-6962:
-----------------------------
    Description: 
How to handle the case if the multiple writer contains a job with bulk insert 
operation?
1. Generated file group id: Generate a fixed file group ID because all 
subsequent jobs will use the fixed file group id suffix instead of random uuid 
suffix. The behavior needs to be consistent to prevent later writer jobs from 
writing the records with same primary key to different file groups.
2.Deal with the transaction: The conflict resolution of bulk insert could not 
defer to the compaction phase. Because bulk insert writers flush data into base 
files, if there are multiple bulk insert job, there might exists multiple base 
files in the same bucket.

  was:
How to handle the case if the multiple writer contains a job with bulk insert 
operation?
1. Generated file group id: Generate a fixed file group ID because all 
subsequent jobs will use the fixed file group id suffix instead of random uuid 
suffix. The behavior needs to be consistent to prevent later writer jobs from 
writing the records with same primary key to different file groups.
2.Resolve conflict: The conflict resolution of bulk insert could not defer to 
the compaction phase. Because bulk insert writers flush data into base files, 
if there are multiple bulk insert job, there might exists multiple base files 
in the same bucket.


> Correct the behavior of bulk insert for NB-CC 
> ----------------------------------------------
>
>                 Key: HUDI-6962
>                 URL: https://issues.apache.org/jira/browse/HUDI-6962
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Jing Zhang
>            Assignee: Jing Zhang
>            Priority: Major
>
> How to handle the case if the multiple writer contains a job with bulk insert 
> operation?
> 1. Generated file group id: Generate a fixed file group ID because all 
> subsequent jobs will use the fixed file group id suffix instead of random 
> uuid suffix. The behavior needs to be consistent to prevent later writer jobs 
> from writing the records with same primary key to different file groups.
> 2.Deal with the transaction: The conflict resolution of bulk insert could not 
> defer to the compaction phase. Because bulk insert writers flush data into 
> base files, if there are multiple bulk insert job, there might exists 
> multiple base files in the same bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6962) Correct the behavior of bulk insert for NB-CC

Reply via email to