Kowol opened a new issue, #13177:
URL: https://github.com/apache/arrow/issues/13177

   Hey!
   
   I'm trying to read json using explicit schema as so:
   **Input file** (`issue.json`):
   ```json
   {"id": "value", "nested": {"value": 1}}
   {"id": "value", "nested": {"value": 1}}
   ```
   
   **Code:**
   ```python
   import pyarrow.json as pj
   import pyarrow as pa
   
   schema = pa.schema([
       pa.field("id", pa.string(), False),
       pa.field("nested", pa.struct([pa.field("value", pa.int64(), False)]))
   ])
   
   table = pj.read_json('./issue.json', 
parse_options=pj.ParseOptions(explicit_schema=schema))
   
   print(schema)
   print(table.schema)
   ```
   
   But the table schema is different - it doesn't contain the not null 
constraint.
   
   **Provided explicit schema:**
   ```
   id: string not null
   nested: struct<value: int64 not null>
     child 0, value: int64 not null
   ```
   
   **Table schema:**
   ```
   id: string
   nested: struct<value: int64>
     child 0, value: int64
   ```
   
   
   I was trying also casting the schema (`table.cast(schema`) and it works for 
top level not null constraint but for nested struct it throws an error:
   ```
   pyarrow.lib.ArrowTypeError: cannot cast nullable field to non-nullable 
field: struct<value: int64> struct<value: int64 not null>
   ```
   
   Is there another way to force the schema? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to