[
https://issues.apache.org/jira/browse/HUDI-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3599:
-----------------------------
Priority: Critical (was: Major)
> Not atomicity commit could cause streaming read loss data
> ---------------------------------------------------------
>
> Key: HUDI-3599
> URL: https://issues.apache.org/jira/browse/HUDI-3599
> Project: Apache Hudi
> Issue Type: Bug
> Components: core
> Reporter: Xiaoqiao He
> Priority: Critical
> Fix For: 0.12.1
>
>
> The current `commit` implement call hierarchy show as following, and
> `transitionState` invoke write deltacommit file to complete this commit. But
> `write file` is not atomicity operation on HDFS for instance.
> {code:java}
> HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant,
> Option<byte[]>, boolean) (org.apache.hudi.common.table.timeline)
> HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant,
> Option<byte[]>) (org.apache.hudi.common.table.timeline)
> HoodieActiveTimeline.saveAsComplete(HoodieInstant, Option<byte[]>)
> (org.apache.hudi.common.table.timeline)
> BaseHoodieWriteClient.commit(HoodieTable, String, String,
> HoodieCommitMetadata, List<HoodieWriteStat>) (org.apache.hudi.client)
> BaseHoodieWriteClient.commitStats(String, List<HoodieWriteStat>,
> Option<Map<String, String>>, String, Map<String, List<String>>)
> (org.apache.hudi.client)
> HoodieFlinkWriteClient.commit(String, List<WriteStatus>,
> Option<Map<String, String>>, String, Map<String, List<String>>)
> (org.apache.hudi.client)
> HoodieJavaWriteClient.commit(String, List<WriteStatus>,
> Option<Map<String, String>>, String, Map<String, List<String>>)
> (org.apache.hudi.client)
> {code}
> As the
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline#createImmutableFileInPath
> said as below, there are three step to complete data write: A. create file,
> B. write data, C. close file handle. Consider `StreamReadMonitoring` traverse
> this deltacommit file but content is null between step A and B then it will
> read nothing at the loop. IMO it could loss some commit data for stream read.
>
> {code:java}
> private void createImmutableFileInPath(Path fullPath, Option<byte[]>
> content) {
> FSDataOutputStream fsout = null;
> try {
> fsout = metaClient.getFs().create(fullPath, false);
> if (content.isPresent()) {
> fsout.write(content.get());
> }
> } catch (IOException e) {
> throw new HoodieIOException("Failed to create file " + fullPath, e);
> } finally {
> try {
> if (null != fsout) {
> fsout.close();
> }
> } catch (IOException e) {
> throw new HoodieIOException("Failed to close file " + fullPath, e);
> }
> }
> }
> {code}
> In order to avoid this corner case, I think we should dependency on `rename`
> operation to complete commit rather than create-write-close flow. Please
> correct me if something I missed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)