[
https://issues.apache.org/jira/browse/HUDI-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-7209:
---------------------------------
Labels: pull-request-available (was: )
> Add configuration to skip not exists file in streaming read
> -----------------------------------------------------------
>
> Key: HUDI-7209
> URL: https://issues.apache.org/jira/browse/HUDI-7209
> Project: Apache Hudi
> Issue Type: Improvement
> Components: flink
> Reporter: Ruguo Yu
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: 289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png
>
>
> In `streaming reading`, if there are a large number of files in metada,
> especially archive files that are very old, then it is IO-intensive to
> determine whether the file exists during the file traversal process. In
> extreme cases, flink checkpoint may not be completed.
> !289447957-f25cda8d-e75c-4380-b660-8ad347c4a6ca.png|width=697,height=562!
> Another potential problem is that if deleted files are skipped by default, is
> there a problem of missing data and the user is not aware of it?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)