[ 
https://issues.apache.org/jira/browse/ARROW-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-9020:
--------------------------------------

    Assignee: Krisztian Szucs

> [Python] read_json won't respect explicit_schema in parse_options
> -----------------------------------------------------------------
>
>                 Key: ARROW-9020
>                 URL: https://issues.apache.org/jira/browse/ARROW-9020
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.1
>         Environment: CPython 3.8.2, MacOS Mojave 10.14.6
>            Reporter: Felipe Santos
>            Assignee: Krisztian Szucs
>            Priority: Major
>             Fix For: 1.0.0
>
>
> I am trying to read a json file using an explicit schema but it looks like 
> the schema is ignored. Moreover, if the my schema contains a field not 
> present in the json file, then the output table contains all the fields in 
> the json file plus the fields of my schema not found in the file.
> A minimal example:
> {code:python}
> import pyarrow as pa
> from pyarrow import json
> # allowing for type inference
> print(json.read_json('tmp.json'))
> # prints:
> # pyarrow.Table
> # foo: string
> # baz: string
> # using an explicit schema that would read only "foo"
> schema = pa.schema([('foo', pa.string())])
> print(json.read_json('tmp.json', 
> parse_options=json.ParseOptions(explicit_schema=schema)))
> # prints:
> # pyarrow.Table
> # foo: string
> # baz: string
> # using an explicit schema that would read only "not_a_field",
> # which is not present in the json file
> schema = pa.schema([('not_a_field', pa.string())])
> print(json.read_json('tmp.json', 
> parse_options=json.ParseOptions(explicit_schema=schema)))
> # prints:
> # pyarrow.Table
> # not_a_field: string
> # foo: string
> # baz: string
> {code}
> And the tmp.json file looks like:
> {code:json}
> {"foo": "bar", "baz": "1"}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to