Ruguo Yu created HUDI-7209:
------------------------------
Summary: Add configuration to skip not exists file in streaming
read
Key: HUDI-7209
URL: https://issues.apache.org/jira/browse/HUDI-7209
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: Ruguo Yu
In `streaming reading`, if there are a large number of files in metada,
especially archive files that are very old, then it is IO-intensive to
determine whether the file exists during the file traversal process. In extreme
cases, flink checkpoint may not be completed.
<img width="1074" alt="image"
src="https://github.com/apache/hudi/assets/13013780/f25cda8d-e75c-4380-b660-8ad347c4a6ca">
Another potential problem is that if deleted files are skipped by default, is
there a problem of missing data and the user is not aware of it?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)