Re: [I] [Python][C++] Flexible handling of conflicts during JSON type inference [arrow]

via GitHub Mon, 11 Aug 2025 11:06:00 -0700


antoniobadia commented on issue #46040:
URL: https://github.com/apache/arrow/issues/46040#issuecomment-3176227414


   Arrow cannot parse JSON correctly when the conversion to column fails. That 
is, it expects every path in JSON to be associated to a unique data type. It 
fails in all the following (examples produced with pyarrow):
       "name": {
         "firstName": "Duckota",
         "lastName": "Fanning"
       },
   ...
       "name": "Jim Jones",
   
   #pyarrow.lib.ArrowInvalid: JSON parse error: Column(/name) changed from 
object to string in row 3
   
       "dimensions": {
         "height": "six foot",
         "weight": "165 lbs"
       },
   ...
       "dimensions": {
        "height": 6.2,
        "weight": 185
       },
   #pyarrow.lib.ArrowInvalid: JSON parse error: Column(/dimensions/height) 
changed from string to number in row 3
   {"a":1,"b":"foo"}
   {"a":2,"b":"bar"}
   {"a":3, "c":{"d":4, "e":5}}
   #pyarrow.lib.ArrowInvalid: JSON parse error: Column(/d/[]) changed from 
number to string in row 3
   {"a":11, "d":[12, "h"]}
   {"a":6, "d":[7, 8, 9]}
   #pyarrow.lib.ArrowInvalid: JSON parse error: Column(/d/[]) changed from 
number to string in row 4
   {"a":10, "d":["f", "g"]}
   
   This may be a limitation made on purpose because of the difficulties of 
transforming into columnar storage. I'm currently working on a project to take 
care of this by using UNIONs automatically in such cases. Please let me know if 
this is already being done/addressed (or if you are interested in joining the 
project)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Python][C++] Flexible handling of conflicts during JSON type inference [arrow]

Reply via email to