[ 
https://issues.apache.org/jira/browse/HUDI-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718521#comment-17718521
 ] 

Sagar Sumit commented on HUDI-3694:
-----------------------------------

Unlike S3 or GCS, writes are not transactional in HDFS. So, in case of S3 and 
GCS we write out each log file and the atomicity is assured by the underlying 
filesystem i.e. either complete file is written or not written at all. While in 
HDFS, there can be partially written log blocks, and hence it is possible that 
magic header in the next log block is corrupt. 

One possible solution is to throw {{CorruptedLogFileException}} from 
{{HoodieLogFileReader#isBlockCorrupted}} if the magic header is corrupt and 
{{{}StorageSchemes.isWriteTransaction{}}}.

> Not use magic number of next block to determine current log block
> -----------------------------------------------------------------
>
>                 Key: HUDI-3694
>                 URL: https://issues.apache.org/jira/browse/HUDI-3694
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: ZiyueGuan
>            Priority: Major
>
> HoodieLogFileReader use magic number of next log block to determine if 
> current log block is corrupted. However, when next block has a corrupted 
> magic number, we will abandon current block, which leads to data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to