Ben Kietzman created ARROW-9748:
-----------------------------------
Summary: [C++][Dataset] Remove Selector, ignore_prefixes from
FileSystemDatasetFactory
Key: ARROW-9748
URL: https://issues.apache.org/jira/browse/ARROW-9748
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Affects Versions: 1.0.0
Reporter: Ben Kietzman
Fix For: 2.0.0
Currently FileSystemDatasetFactory can be constructed with an explicit listing
of files or with a {{fs::FileSelector}}. Since the selector does not support
sophisticated selection criteria,
{{FileSystemFactoryOptions::selector_ignore_prefixes}} to allow users to
exclude undesired files such as {{_metadata}} or {{.DS_STORE}}.
The selector + ignored prefixes mechanism is inflexible with numerous edge
cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file
selection logic in dataset discovery prevents it from being reused by other
consumers of the file system api.
Remove FileSystemDatasetFactory's constructor-from-selector, optionally adding
that functionality directly to {{fs::FileSelector}}. An explicit listing of
files for use in construction of a FileSystemDatasetFactory can then be
assembled using an {{fs::FileSelector}} and/or other globbing libraries, with
arbitrary inclusion logic.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)