Yun Tang created FLINK-11868:
--------------------------------
Summary: [filesystems] Introduce listStatusIterator API to file
system
Key: FLINK-11868
URL: https://issues.apache.org/jira/browse/FLINK-11868
Project: Flink
Issue Type: Improvement
Components: FileSystems
Reporter: Yun Tang
Assignee: Yun Tang
Fix For: 1.9.0
>From existed experience, we know {{listStatus}} is expensive for many
>distributed file systems especially when the folder contains too many files.
>This method would not only block the thread until result is return but also
>could cause OOM due to the returned array of {{FileStatus}} is really large. I
>think we should learn it from FLINK-7266 and FLINK-8540.
However, list file status under a path is really helpful in many situations.
Thankfully, many distributed file system noticed that and provide API such as
{{[listStatusIterator|https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#listStatusIterator(org.apache.hadoop.fs.Path)]}}
to call the file system on demand.
We should also introduce this API and replace current implementation which used
previous {{listStatus}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)