[ 
https://issues.apache.org/jira/browse/HUDI-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4270:
---------------------------------
    Labels: pull-request-available  (was: )

> Bootstrap operation data loading missing
> ----------------------------------------
>
>                 Key: HUDI-4270
>                 URL: https://issues.apache.org/jira/browse/HUDI-4270
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Bo Cui
>            Priority: Major
>              Labels: pull-request-available
>
> [https://github.com/apache/hudi/issues/4558]
> Procedure:
> 1. The fs of hudi supports append (for example hdfs... local fs does not 
> support append)
> 2. Use `hoodie.logfile.max.size` to control the log file size and generate 
> multiple logs(for example log#1 log#2)
> 3. After the last instant time of log#2 is written(for example 
> 20220616180000), but JM failes to submit the 20220616180000 commit
> 4. Restart the fink job.
>   4.1 and the Bootstrap operator loads the all index
>   4.2 the job rolls back the data of the 20220616180000 and rollback instant 
> is 20220616180010,  and append rollback block to log#2 
> 5. In this case, the maximum instant in log#2 is 20220616180010, and the 
> maximum instant of Bootstrap operator is 20220616180000 
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195]).
> 6. If log#2 is read first, both log#2 and log#1 will be skipped because 
> 20220616180010 is larger than 20220616180000 
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236])
> 7. In this way, the data of log#1 is not loaded.
> Solution : Sorting log files in positive order



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to