[
https://issues.apache.org/jira/browse/ARROW-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Molina updated ARROW-9748:
-------------------------------------
Fix Version/s: 8.0.0
(was: 7.0.0)
> [C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory
> -----------------------------------------------------------------------------
>
> Key: ARROW-9748
> URL: https://issues.apache.org/jira/browse/ARROW-9748
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 1.0.0
> Reporter: Ben Kietzman
> Assignee: Weston Pace
> Priority: Major
> Labels: dataset
> Fix For: 8.0.0
>
>
> Currently FileSystemDatasetFactory can be constructed with an explicit
> listing of files or with a {{fs::FileSelector}}. Since the selector does not
> support sophisticated selection criteria,
> {{FileSystemFactoryOptions::selector_ignore_prefixes}} to allow users to
> exclude undesired files such as {{_metadata}} or {{.DS_STORE}}.
> The selector + ignored prefixes mechanism is inflexible with numerous edge
> cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file
> selection logic in dataset discovery prevents it from being reused by other
> consumers of the file system api.
> Remove FileSystemDatasetFactory's constructor-from-selector, optionally
> adding that functionality directly to {{fs::FileSelector}}. An explicit
> listing of files for use in construction of a FileSystemDatasetFactory can
> then be assembled using an {{fs::FileSelector}} and/or other globbing
> libraries, with arbitrary inclusion logic.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)