ManManson opened a new pull request, #13796:
URL: https://github.com/apache/arrow/pull/13796

   This header makes use of `int8_t`, which is defined in `<cstdint>`
   system header.
   
   Introduce a helper class `AsyncStatSelector`, which contains
   an optimized specialization for `GetFileInfoGenerator` in the
   `LocalFileSystem` class.
   
   There are two variants of async discovery functions suported:
   1. `DiscoverPartitionFiles`, which parallelizes traversal of
      individual directories so that each directory results are
      yielded as a separate `FileInfoGenerator` via an underlying
      `DiscoveryImplIterator`, which delivers items in chunks
      (default size is `kBatchSize == 1K` items).
   2. `DiscoverPartitionsFlattened`, which forwards execution to
      the `DiscoverPartitionFiles`, with the difference that the
      results from individual sub-directory iterators are merged
      into the single FileInfoGenerator stream.
   
   The implementation makes use of additional attributes in
   `FileSelector`, such as `partitions_readahead`, which can be used
   to tune algorithm behavior and adjust how many directories
   can be processed in parallel. This option is disabled
   by default, so that individual partitions are processed in
   serial manner via `MakeConcatenatedGenerator` under the hood.
   
   Tests: unit(release)
   
   Signed-off-by: Pavel Solodovnikov <[email protected]>
   Co-Authored-by: Igor Seliverstov <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to