Ben Kietzman created ARROW-8164:
-----------------------------------

             Summary: [C++][Dataset] Let datasets be viewable with 
non-identical schema
                 Key: ARROW-8164
                 URL: https://issues.apache.org/jira/browse/ARROW-8164
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, C++ - Dataset
    Affects Versions: 0.16.0
            Reporter: Ben Kietzman
            Assignee: Ben Kietzman
             Fix For: 1.0.0


It would be useful to allow some schema unification capability after discovery 
has completed. For example, if a FileSystemDataset is being wrapped into a 
UnionDataset with another and their schemas are unifiable then there is no 
reason we can't create the UnionDataset (rather than emitting an error because 
the schemas are not identical).

I think this behavior will be most naturally expressed in C++ like so:

{code}
virtual Result<Dataset> Dataset::ReplaceSchema(std::shared_ptr<Schema> schema) 
const = 0;
{code}

which will raise an error if the provided schema is not unifiable with the 
current dataset schema.

If this needs to be extended to non trivial projections then this will probably 
warrant a separate class, {{ProjectedDataset}} or so. Definitely follow up 
material (if desired)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to