ShaimaaSabry opened a new issue, #43165:
URL: https://github.com/apache/arrow/issues/43165

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I have the following schema:
   `bankrupcy: struct<bankrupcy_flag: bool, bankrupcy_records: list<item: 
struct<value: string>>>`
   
   and the following data:
   `[{'bankrupcy': {'bankrupcy_flag': False, 'bankrupcy_records': []}}]`
   where "bankrupcy_records" happens to be an empty array.
   
   I am trying to write the data to parquet using this code:
   `  headers = ["bankrupcy"]
       data = [
           {
               "bankrupcy": {
                   "bankrupcy_flag": False,
                   "bankrupcy_records": []
               }
           }
       ]
   
       data_transposed = defaultdict(list)
       for row in data:
           for header in headers:
               value = row.get(header)
               value = value.as_py() if isinstance(value, Scalar) else value
               data_transposed[header].append(value)
   
       table = Table.from_pydict(data_transposed)
       
       writer = ParquetWriter('example.parquet', schema, version="1.0")
       writer.write_table(table)
       writer.close()` 
   
   I am getting this error:
   `ValueError: Table schema does not match schema used to create file: 
   table:
   bankrupcy: struct<bankrupcy_flag: bool, bankrupcy_records: list<item: null>>
     child 0, bankrupcy_flag: bool
     child 1, bankrupcy_records: list<item: null>
         child 0, item: null vs. 
   file:
   bankrupcy: struct<bankrupcy_flag: bool, bankrupcy_records: list<item: 
struct<value: string>>>
     child 0, bankrupcy_flag: bool
     child 1, bankrupcy_records: list<item: struct<value: string>>
         child 0, item: struct<value: string>
             child 0, value: string`
   
   How can I define the schema so that pyarrow would allow the value of 
"bankrupcy_records"  to be an empty list sometimes?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to