Bo Cui created HUDI-4078:
----------------------------

             Summary: BootstrapOperator cannot load all index data
                 Key: HUDI-4078
                 URL: https://issues.apache.org/jira/browse/HUDI-4078
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Bo Cui


the bootstrapOperator can not obtain all the parquet and and log from the 
hoodieTable#getSliceView()#getLatestFileSlicesBeforeOrOn

Procedure:
1) write 10k records to the HUDI table by stream mode.
create table() with (
 'table.type' = 'MERGE_ON_READ',
 'index.bootstrap.enabled' =  'true',
 'archive.max_commits' = '4200',
'archive.min_commits' = '4000',
'clean.retain_commits' = '3999', 
...
)
2) stop job, and delete the last compaction commit, like 
`.hoodie/20220505131426.commit`
3) restart job without chk/savepoint and not write data.
4)  Observe how much index data is loaded to the bootstrapOperator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to