ManManson opened a new pull request, #13796:
URL: https://github.com/apache/arrow/pull/13796
This header makes use of `int8_t`, which is defined in `<cstdint>`
system header.
Introduce a helper class `AsyncStatSelector`, which contains
an optimized specialization for `GetFileInfoGenerator` in the
`LocalFileSystem` class.
There are two variants of async discovery functions suported:
1. `DiscoverPartitionFiles`, which parallelizes traversal of
individual directories so that each directory results are
yielded as a separate `FileInfoGenerator` via an underlying
`DiscoveryImplIterator`, which delivers items in chunks
(default size is `kBatchSize == 1K` items).
2. `DiscoverPartitionsFlattened`, which forwards execution to
the `DiscoverPartitionFiles`, with the difference that the
results from individual sub-directory iterators are merged
into the single FileInfoGenerator stream.
The implementation makes use of additional attributes in
`FileSelector`, such as `partitions_readahead`, which can be used
to tune algorithm behavior and adjust how many directories
can be processed in parallel. This option is disabled
by default, so that individual partitions are processed in
serial manner via `MakeConcatenatedGenerator` under the hood.
Tests: unit(release)
Signed-off-by: Pavel Solodovnikov <[email protected]>
Co-Authored-by: Igor Seliverstov <[email protected]>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]