Francois Saint-Jacques created ARROW-8065:
---------------------------------------------
Summary: [C++][Dataset] Untangle Dataset, Fragment and ScanOptions
Key: ARROW-8065
URL: https://issues.apache.org/jira/browse/ARROW-8065
Project: Apache Arrow
Issue Type: Improvement
Reporter: Francois Saint-Jacques
We should be able to list fragments without going through the
Scanner/ScanOptions hoops. This exposes a flaw with the current API where it
require a ScanOptions to create Fragment, this is also a problem for
ARROW-7824, i.e. why do we need a ScanOptions (read manifest) to write record
batches to a given path.
# Remove {{ScanOptions}} from Fragment's properties and move it into
{{Fragment::Scan}} parameters.
# Remove {{ScanOptions}} from {{Dataset::GetFragments}}, if required, we can
still provide an alternate signature, e.g.
{{Dataset::GetFragments(std::shared_ptr<Expression> predicate)}} for sub-tree
pruning in FileSystemDataset.
# Fragment constructor should take a schema (and store it as a property),
usually extracted from the Dataset schema. Update the schema() method
accordingly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)