[
https://issues.apache.org/jira/browse/HUDI-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen updated HUDI-4270:
-----------------------------
Fix Version/s: 0.12.0
> Bootstrap operation data loading missing
> ----------------------------------------
>
> Key: HUDI-4270
> URL: https://issues.apache.org/jira/browse/HUDI-4270
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Bo Cui
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/issues/4558]
> Procedure:
> 1. The fs of hudi supports append (for example hdfs... local fs does not
> support append)
> 2. Use `hoodie.logfile.max.size` to control the log file size and generate
> multiple logs(for example log#1 log#2)
> 3. After the last instant time of log#2 is written(for example
> 20220616180000), but JM failes to submit the 20220616180000 commit
> 4. Restart the fink job.
> 4.1 and the Bootstrap operator loads the all index
> 4.2 the job rolls back the data of the 20220616180000 and rollback instant
> is 20220616180010, and append rollback block to log#2
> 5. In this case, the maximum instant in log#2 is 20220616180010, and the
> maximum instant of Bootstrap operator is 20220616180000
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195]).
> 6. If log#2 is read first, both log#2 and log#1 will be skipped because
> 20220616180010 is larger than 20220616180000
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236])
> 7. In this way, the data of log#1 is not loaded.
> Solution : Sorting log files in positive order
--
This message was sent by Atlassian Jira
(v8.20.7#820007)