Ruguo Yu created HUDI-7209:
------------------------------

             Summary: Add configuration to skip not exists file in streaming 
read
                 Key: HUDI-7209
                 URL: https://issues.apache.org/jira/browse/HUDI-7209
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: Ruguo Yu


In `streaming reading`, if there are a large number of files in metada, 
especially archive files that are very old, then it is IO-intensive to 
determine whether the file exists during the file traversal process. In extreme 
cases, flink checkpoint may not be completed.
<img width="1074" alt="image" 
src="https://github.com/apache/hudi/assets/13013780/f25cda8d-e75c-4380-b660-8ad347c4a6ca";>

Another potential problem is that if deleted files are skipped by default, is 
there a problem of missing data and the user is not aware of it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to