[jira] [Created] (ARROW-18388) [C++] Decide on duplicate column handling in scanner, add more tests

Weston Pace (Jira) Tue, 22 Nov 2022 15:35:06 -0800

Weston Pace created ARROW-18388:
-----------------------------------

             Summary: [C++] Decide on duplicate column handling in scanner, add 
more tests
                 Key: ARROW-18388
                 URL: https://issues.apache.org/jira/browse/ARROW-18388
             Project: Apache Arrow
          Issue Type: Sub-task
          Components: C++
            Reporter: Weston Pace



When a schema has duplicate column names it can be difficult to know how to map 
between the fragment schema and the dataset schema in the default evolution 
strategy.  It's not clear from the comments describing evolution what the exact 
behavior is right now.  Some suggestions have been:

 * Grab the first column in the fragment schema with the same name
 * Always error if there are duplicate columns
 * Allow duplicate columns but expect there to be the same # of occurrences in 
both the fragment and dataset schema and assume the order is consistent



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (ARROW-18388) [C++] Decide on duplicate column handling in scanner, add more tests

Reply via email to