[
https://issues.apache.org/jira/browse/HUDI-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhang updated HUDI-6962:
-----------------------------
Description:
How to handle the case if the multiple writer contains a job with bulk insert
operation?
1. Generated file group id: Generate a fixed file group ID because all
subsequent jobs will use the fixed file group id suffix instead of random uuid
suffix. The behavior needs to be consistent to prevent later writer jobs from
writing the records with same primary key to different file groups.
2.Deal with the transaction: The conflict resolution of bulk insert could not
defer to the compaction phase. Because bulk insert writers flush data into base
files, if there are multiple bulk insert job, there might exists multiple base
files in the same bucket.
was:
How to handle the case if the multiple writer contains a job with bulk insert
operation?
1. Generated file group id: Generate a fixed file group ID because all
subsequent jobs will use the fixed file group id suffix instead of random uuid
suffix. The behavior needs to be consistent to prevent later writer jobs from
writing the records with same primary key to different file groups.
2.Resolve conflict: The conflict resolution of bulk insert could not defer to
the compaction phase. Because bulk insert writers flush data into base files,
if there are multiple bulk insert job, there might exists multiple base files
in the same bucket.
> Correct the behavior of bulk insert for NB-CC
> ----------------------------------------------
>
> Key: HUDI-6962
> URL: https://issues.apache.org/jira/browse/HUDI-6962
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Jing Zhang
> Assignee: Jing Zhang
> Priority: Major
>
> How to handle the case if the multiple writer contains a job with bulk insert
> operation?
> 1. Generated file group id: Generate a fixed file group ID because all
> subsequent jobs will use the fixed file group id suffix instead of random
> uuid suffix. The behavior needs to be consistent to prevent later writer jobs
> from writing the records with same primary key to different file groups.
> 2.Deal with the transaction: The conflict resolution of bulk insert could not
> defer to the compaction phase. Because bulk insert writers flush data into
> base files, if there are multiple bulk insert job, there might exists
> multiple base files in the same bucket.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)