[
https://issues.apache.org/jira/browse/HUDI-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-2780:
-----------------------------
Flagged: (was: Impediment)
> Mor reads the log file and skips the complete block as a bad block, resulting
> in data loss
> ------------------------------------------------------------------------------------------
>
> Key: HUDI-2780
> URL: https://issues.apache.org/jira/browse/HUDI-2780
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: jing
> Assignee: jing
> Priority: Critical
> Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
> Attachments: image-2021-11-17-15-45-33-031.png,
> image-2021-11-17-15-46-04-313.png, image-2021-11-17-15-46-14-694.png
>
>
> Check the data in the middle of the bad block through debug, and find that
> the lost data is in the offset of the bad block, but because of the eof skip
> during the reading, the compact merge cannot be written to the parquet at
> that time, but the deltacommit of the time is successful. There are two
> consecutive hudi magic in the middle of the bad block. Reading blocksize in
> the next digit actually reads the binary conversion of #HUDI# to 1227030528,
> which means that the eof exception is reported when the file size is exceeded.
> !image-2021-11-17-15-45-33-031.png!
> Detect the position of the next block and skip the bad block. It should not
> start from the position after reading the blocksize, but from the position
> before reading the blocksize
> !image-2021-11-17-15-46-04-313.png!
> !image-2021-11-17-15-46-14-694.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)