[
https://issues.apache.org/jira/browse/HUDI-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449747#comment-17449747
]
Danny Chen commented on HUDI-2576:
----------------------------------
Thanks for the feedback [~liyuanzhao435], the file is deleted because of the
marker based file cleaning before metadata commit,
it is the #finalizeWrite step of the {{FlinkHoodieWriteClient}}.
I guess there may be some metadata exception that does not report the written
files correctly, and the coordinator finally diff and clean the file.
Do you use the append mode write ? I saw you use the
{{HoodieRowDataCreateHandle}}.
> flink do checkpoint error because parquet file is missing
> ----------------------------------------------------------
>
> Key: HUDI-2576
> URL: https://issues.apache.org/jira/browse/HUDI-2576
> Project: Apache Hudi
> Issue Type: Bug
> Components: Flink Integration
> Affects Versions: 0.10.0
> Reporter: liyuanzhao435
> Priority: Major
> Labels: flink, hudi
> Fix For: 0.11.0
>
> Attachments: error.txt
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> hudi:0.10.0, flink 1.13.1
> some times when flink do checkpoint , error occurs, the error shows a hudi
> parquet file is missing (says file not exists) :
> *2021-10-19 09:20:03,796 INFO
> org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle [] - start close
> hoodie row data*
> *2021-10-19 09:20:03,800 WARN org.apache.hadoop.hdfs.DataStreamer [] -
> DataStreamer Exception*
> *java.io.FileNotFoundException: File does not exist:
> /tmp/test_liyz2/aa/2ff301cc-8db2-478e-b707-e8f2327ba38f-0_0-1-4_20211019091917.parquet
> (inode 32234795) Holder DFSClient_NONMAPREDUCE_633610786_99 does not have
> any open files.*
> *at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2815)*
>
> detail see appendix
--
This message was sent by Atlassian Jira
(v8.20.1#820001)