[
https://issues.apache.org/jira/browse/ARROW-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Francois Saint-Jacques updated ARROW-8065:
------------------------------------------
Component/s: C++ - Dataset
> [C++][Dataset] Untangle Dataset, Fragment and ScanOptions
> ---------------------------------------------------------
>
> Key: ARROW-8065
> URL: https://issues.apache.org/jira/browse/ARROW-8065
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++ - Dataset
> Reporter: Francois Saint-Jacques
> Priority: Major
>
> We should be able to list fragments without going through the
> Scanner/ScanOptions hoops. This exposes a flaw with the current API where it
> require a ScanOptions to create Fragment, this is also a problem for
> ARROW-7824, i.e. why do we need a ScanOptions (read manifest) to write record
> batches to a given path.
> # Remove {{ScanOptions}} from Fragment's properties and move it into
> {{Fragment::Scan}} parameters.
> # Remove {{ScanOptions}} from {{Dataset::GetFragments}}, if required, we can
> still provide an alternate signature, e.g.
> {{Dataset::GetFragments(std::shared_ptr<Expression> predicate)}} for sub-tree
> pruning in FileSystemDataset.
> # Fragment constructor should take a schema (and store it as a property),
> usually extracted from the Dataset schema. Update the schema() method
> accordingly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)