[ 
https://issues.apache.org/jira/browse/HUDI-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3599:
-----------------------------
    Priority: Critical  (was: Major)

> Not atomicity commit could cause streaming read loss data
> ---------------------------------------------------------
>
>                 Key: HUDI-3599
>                 URL: https://issues.apache.org/jira/browse/HUDI-3599
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: core
>            Reporter: Xiaoqiao He
>            Priority: Critical
>             Fix For: 0.12.1
>
>
> The current `commit` implement call hierarchy show as following, and 
> `transitionState` invoke write deltacommit file to complete this commit. But 
> `write file` is not atomicity operation on HDFS for instance. 
> {code:java}
> HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant, 
> Option<byte[]>, boolean)  (org.apache.hudi.common.table.timeline)
>  HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant, 
> Option<byte[]>)  (org.apache.hudi.common.table.timeline)
>   HoodieActiveTimeline.saveAsComplete(HoodieInstant, Option<byte[]>)  
> (org.apache.hudi.common.table.timeline)
>    BaseHoodieWriteClient.commit(HoodieTable, String, String, 
> HoodieCommitMetadata, List<HoodieWriteStat>)  (org.apache.hudi.client)
>     BaseHoodieWriteClient.commitStats(String, List<HoodieWriteStat>, 
> Option<Map<String, String>>, String, Map<String, List<String>>)  
> (org.apache.hudi.client)
>      HoodieFlinkWriteClient.commit(String, List<WriteStatus>, 
> Option<Map<String, String>>, String, Map<String, List<String>>)  
> (org.apache.hudi.client)
>      HoodieJavaWriteClient.commit(String, List<WriteStatus>, 
> Option<Map<String, String>>, String, Map<String, List<String>>)  
> (org.apache.hudi.client)
> {code}
> As the 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline#createImmutableFileInPath
>  said as below, there are three step to complete data write: A. create file, 
> B. write data, C. close file handle. Consider `StreamReadMonitoring` traverse 
> this deltacommit file but content is null between step A and B then it will 
> read nothing at the loop. IMO it could loss some commit data for stream read. 
>  
> {code:java}
>   private void createImmutableFileInPath(Path fullPath, Option<byte[]> 
> content) {
>     FSDataOutputStream fsout = null;
>     try {
>       fsout = metaClient.getFs().create(fullPath, false);
>       if (content.isPresent()) {
>         fsout.write(content.get());
>       }
>     } catch (IOException e) {
>       throw new HoodieIOException("Failed to create file " + fullPath, e);
>     } finally {
>       try {
>         if (null != fsout) {
>           fsout.close();
>         }
>       } catch (IOException e) {
>         throw new HoodieIOException("Failed to close file " + fullPath, e);
>       }
>     }
>   }
> {code}
> In order to avoid this corner case, I think we should dependency on `rename` 
> operation to complete commit rather than create-write-close flow. Please 
> correct me if something I missed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to