[ 
https://issues.apache.org/jira/browse/ARROW-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509748#comment-17509748
 ] 

Alessandro Molina commented on ARROW-15978:
-------------------------------------------

When a schema is explicitly provided, it might be reasonable to promote all the 
single entries to lists if the schema was set to a list. 

Automatic promotion is instead an harder topic. The risk of evolving into a 
multi-criteria decision system is not rare, as users might start reporting 
behaviour change requests once the guessing logic is released. For example a 
random user might argue that if all entries are a list and only one is a single 
value instead of promoting the single value we should error because it was 
obviously a wrong entry. Requests like this might pile up and the system would 
become too complex to work reliably pretty quickly.

> [C++] Have JSON reader treat mixed singleton/array fields as arrays.
> --------------------------------------------------------------------
>
>                 Key: ARROW-15978
>                 URL: https://issues.apache.org/jira/browse/ARROW-15978
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python, R
>    Affects Versions: 7.0.0
>            Reporter: Ben Schmidt
>            Priority: Minor
>
> I frequently encounter real-world files that mix array types and singletons 
> across entries for a single field. For example, consider an ndjson file 
> consisting of:
>  
> {code:java}
> {"author": "Hunter Thompson"}
> {"author": ["Bob Woodward", "Carl Bernstein"]} {code}
> Widely used specs promote writing JSON like this, where a singleton isn't be 
> wrapped in array brackets. For example, the 'target' field in the w3 
> annotation model [may be a string or an array of 
> strings.|https://www.w3.org/TR/annotation-model/#:~:text=The%20body%20and/or%20target%20relationships%20of%20the%20Annotation%20may%20be%20arrays%20rather%20than%20a%20single%20object.]
>  
>  
> Currently I see no way to read this sort of data with the C++ json reader. It 
> would be nice if arrow's ndjson reader could do two things to support data 
> like this.
>  # When inferring types, silently promote entries of type <T> to type <T[]> 
> if the column is mixed;
>  # When passed an explicit schema that includes a ListArray, promote all 
> instances of the field to an array if they aren't already. 
> My sense is that this might be pretty simple.
> Thanks to everyone who works on this project.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to