[ 
https://issues.apache.org/jira/browse/ARROW-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-8065:
------------------------------------------
    Component/s: C++ - Dataset

> [C++][Dataset] Untangle Dataset, Fragment and ScanOptions
> ---------------------------------------------------------
>
>                 Key: ARROW-8065
>                 URL: https://issues.apache.org/jira/browse/ARROW-8065
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Dataset
>            Reporter: Francois Saint-Jacques
>            Priority: Major
>
> We should be able to list fragments without going through the 
> Scanner/ScanOptions hoops. This exposes a flaw with the current API where it 
> require a ScanOptions to create Fragment, this is also a problem for 
> ARROW-7824, i.e. why do we need a ScanOptions (read manifest) to write record 
> batches to a given path.
>  # Remove {{ScanOptions}} from Fragment's properties and move it into 
> {{Fragment::Scan}} parameters.
>  # Remove {{ScanOptions}} from {{Dataset::GetFragments}}, if required, we can 
> still provide an alternate signature, e.g. 
> {{Dataset::GetFragments(std::shared_ptr<Expression> predicate)}} for sub-tree 
> pruning in FileSystemDataset.
>  # Fragment constructor should take a schema (and store it as a property), 
> usually extracted from the Dataset schema. Update the schema() method 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to