Kowol opened a new issue, #13177:
URL: https://github.com/apache/arrow/issues/13177
Hey!
I'm trying to read json using explicit schema as so:
**Input file** (`issue.json`):
```json
{"id": "value", "nested": {"value": 1}}
{"id": "value", "nested": {"value": 1}}
```
**Code:**
```python
import pyarrow.json as pj
import pyarrow as pa
schema = pa.schema([
pa.field("id", pa.string(), False),
pa.field("nested", pa.struct([pa.field("value", pa.int64(), False)]))
])
table = pj.read_json('./issue.json',
parse_options=pj.ParseOptions(explicit_schema=schema))
print(schema)
print(table.schema)
```
But the table schema is different - it doesn't contain the not null
constraint.
**Provided explicit schema:**
```
id: string not null
nested: struct<value: int64 not null>
child 0, value: int64 not null
```
**Table schema:**
```
id: string
nested: struct<value: int64>
child 0, value: int64
```
I was trying also casting the schema (`table.cast(schema`) and it works for
top level not null constraint but for nested struct it throws an error:
```
pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable
field: struct<value: int64> struct<value: int64 not null>
```
Is there another way to force the schema?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]