[
https://issues.apache.org/jira/browse/HUDI-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bo Cui updated HUDI-4270:
-------------------------
Description:
[https://github.com/apache/hudi/issues/4558]
Procedure:
1. The fs of hudi supports append (for example hdfs... local fs does not
support append)
2. Use `hoodie.logfile.max.size` to control the log file size and generate
multiple logs(for example log#1 log#2)
3. After the last instant time of log#2 is written(for example 20220616180000),
but JM failes to submit the 20220616180000 commit
4. Restart the fink job.
4.1 and the Bootstrap operator loads the all index
4.2 the job rolls back the data of the 20220616180000 and rollback instant is
20220616180010, and append rollback block to log#2
5. In this case, the maximum instant in log#2 is 20220616180010, and the
maximum instant of Bootstrap operator is 20220616180000
([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195]).
6. If log#2 is read first, both log#2 and log#1 will be skipped because
20220616180010 is larger than 20220616180000
([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236])
7. In this way, the data of log#1 is not loaded.
Solution : Sorting log files in positive order
was:
[https://github.com/apache/hudi/issues/4558]
Procedure:
1. The fs of hudi supports append (for example hdfs... local fs does not
support append)
2. Use `hoodie.logfile.max.size` to control the log file size and generate
multiple logs(for example log#1 log#2)
3. After the last instant time of log#2 is written(for example 20220616180000),
but JM failes to submit the 20220616180000 commit
4. Restart the fink job.
4.1 and the index operator loads the all index
4.2 the job rolls back the data of the 20220616180000 and rollback instant is
20220616180010, and append rollback block to log#2
5. In this case, the maximum instant in log#2 is 20220616180010, and the
maximum instant in index operator is 20220616180000
([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195]).
6. If log#2 is read first, both log#2 and log#1 will be skipped because
20220616180010 is larger than 20220616180000
([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236])
7. In this way, the data of log#1 is not loaded.
Solution : Sorting log files in positive order
> Bootstrap operation data loading missing
> ----------------------------------------
>
> Key: HUDI-4270
> URL: https://issues.apache.org/jira/browse/HUDI-4270
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Bo Cui
> Priority: Major
>
> [https://github.com/apache/hudi/issues/4558]
> Procedure:
> 1. The fs of hudi supports append (for example hdfs... local fs does not
> support append)
> 2. Use `hoodie.logfile.max.size` to control the log file size and generate
> multiple logs(for example log#1 log#2)
> 3. After the last instant time of log#2 is written(for example
> 20220616180000), but JM failes to submit the 20220616180000 commit
> 4. Restart the fink job.
> 4.1 and the Bootstrap operator loads the all index
> 4.2 the job rolls back the data of the 20220616180000 and rollback instant
> is 20220616180010, and append rollback block to log#2
> 5. In this case, the maximum instant in log#2 is 20220616180010, and the
> maximum instant of Bootstrap operator is 20220616180000
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java#L195]).
> 6. If log#2 is read first, both log#2 and log#1 will be skipped because
> 20220616180010 is larger than 20220616180000
> ([https://github.com/apache/hudi/blob/0ff34b697416fc697dfa96a80496b74598a95263/hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java#L236])
> 7. In this way, the data of log#1 is not loaded.
> Solution : Sorting log files in positive order
--
This message was sent by Atlassian Jira
(v8.20.7#820007)