Pavel Solodovnikov created ARROW-17306:
------------------------------------------

             Summary: Provide an optimized`GetFileInfoGenerator` specialization 
for `LocalFileSystem`
                 Key: ARROW-17306
                 URL: https://issues.apache.org/jira/browse/ARROW-17306
             Project: Apache Arrow
          Issue Type: Sub-task
          Components: C++
            Reporter: Pavel Solodovnikov
            Assignee: Pavel Solodovnikov


At the moment, `LocalFileSystem` does not have a separate optimized 
implementation of `GetFileInfoGenerator` with a fallback to the generic 
`FileSystem::GetFileInfoGenerator`, which simply queues the synchronous version 
of `GetFileInfo(FileSelector)` to the background thread and waits for its 
completion before yielding.

This generally defeats all the purpose of `GetFileInfoGenerator` so that we 
cannot really use it to push down the `FileInfo` items to whatever consumer "on 
the fly" (e.g. `FileSystemDatasetFactory` and `FileSystemDataset`, 
correspondingly).

Provide a fair implementation so that it yields more than one time and allows 
to retrieve the data in chunks, so that the resulting `FileInfoGenerator` is 
usable for the purpose of streaming processing of data.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to