Pavel Solodovnikov created ARROW-17306:
------------------------------------------
Summary: Provide an optimized`GetFileInfoGenerator` specialization
for `LocalFileSystem`
Key: ARROW-17306
URL: https://issues.apache.org/jira/browse/ARROW-17306
Project: Apache Arrow
Issue Type: Sub-task
Components: C++
Reporter: Pavel Solodovnikov
Assignee: Pavel Solodovnikov
At the moment, `LocalFileSystem` does not have a separate optimized
implementation of `GetFileInfoGenerator` with a fallback to the generic
`FileSystem::GetFileInfoGenerator`, which simply queues the synchronous version
of `GetFileInfo(FileSelector)` to the background thread and waits for its
completion before yielding.
This generally defeats all the purpose of `GetFileInfoGenerator` so that we
cannot really use it to push down the `FileInfo` items to whatever consumer "on
the fly" (e.g. `FileSystemDatasetFactory` and `FileSystemDataset`,
correspondingly).
Provide a fair implementation so that it yields more than one time and allows
to retrieve the data in chunks, so that the resulting `FileInfoGenerator` is
usable for the purpose of streaming processing of data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)