ASF GitHub Bot updated ARROW-8164:
    Labels: pull-request-available  (was: )

> [C++][Dataset] Let datasets be viewable with non-identical schema
> -----------------------------------------------------------------
>                 Key: ARROW-8164
>                 URL: https://issues.apache.org/jira/browse/ARROW-8164
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, C++ - Dataset
>    Affects Versions: 0.16.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
> It would be useful to allow some schema unification capability after 
> discovery has completed. For example, if a FileSystemDataset is being wrapped 
> into a UnionDataset with another and their schemas are unifiable then there 
> is no reason we can't create the UnionDataset (rather than emitting an error 
> because the schemas are not identical).
> I think this behavior will be most naturally expressed in C++ like so:
> {code}
> virtual Result<Dataset> Dataset::ReplaceSchema(std::shared_ptr<Schema> 
> schema) const = 0;
> {code}
> which will raise an error if the provided schema is not unifiable with the 
> current dataset schema.
> If this needs to be extended to non trivial projections then this will 
> probably warrant a separate class, {{ProjectedDataset}} or so. Definitely 
> follow up material (if desired)

This message was sent by Atlassian Jira

Reply via email to