christiangiessleraracom opened a new issue, #36060:
URL: https://github.com/apache/arrow/issues/36060

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   The following problem:
   If an optional field is not specified in the json, but is in the schema, it 
is still created in the pyarrow table, including all nested fields that are 
specified in the schema (with null values).
   Is this the intended behaviour or is there a setting option so that 
non-existent fields are also not in the table?
   
   
   Here is a data example. our productive data schema is of course much more 
complex and more nested, but it illustrates what I am doing:
   
   Schema (all fields are nullable):
   ```
   field1: struct<subfield1: double, subfield2: double>
   field2: timestamp[ms]
   field3: double
   
   ```
   
   json file:
   ```json
   {
     "field3": 123.4
   }
   ```
   
   Python code handling the data:
   
   ```python
   read_options = pajson.ReadOptions(block_size=1600000000)
   
   parse_options = pajson.ParseOptions(
        explicit_schema=pa_schema,
        unexpected_field_behavior="ignore"
   )
   table = pajson.read_json(
        tmp_file_name,  read_options=read_options, parse_options=parse_options
   )
   
   pq.write_to_dataset(
        table=table,
        root_path=dataset_path,
        basename_template=hashvalue + ".parquet",
        existing_data_behavior="overwrite_or_ignore",
        schema=pa_schema
   )
   ```
   
   table debug output from evaluation in pycharm: 
   ```
   column_names: ['field1', 'field2', 'field3']
   columns:
   [
     -- is_valid:
         [
         false
       ]
     -- child 0 type: double
       [
         null
       ]
     -- child 1 type: double
       [
         null
       ]
   ]
   [
     [
       null
     ]
   ]
   [
     [
       123.4
     ]
   ]
   
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to