mikelui opened a new pull request, #37376:
URL: https://github.com/apache/arrow/pull/37376

   
   
   ### Rationale for this change
   
   See: #32439 
   
   ### What changes are included in this PR?
   
   During conversion from Python to Arrow, when a struct's child hits a 
capacity error and chunking is triggered, this can leave the Finish'd chunk in 
an invalid state since the struct's length does not match the length of its 
children.
   
   This change simply tries to Append the children first, and only if 
successful will Append the struct. This is safe because the order of Append'ing 
between the struct and its child is not specified. It is only specified that 
they must be consistent with each other.
   
   This is per: 
   
   
https://github.com/apache/arrow/blob/86b7a84c9317fa08222eb63f6930bbb54c2e6d0b/cpp/src/arrow/array/builder_nested.h#L507-L508
   
   ### Are these changes tested?
   
   A unit test is added that would previously have an invalid data error:
   
   ```
   >       tab = pa.Table.from_pandas(df)
   
   pyarrow/tests/test_pandas.py:4970: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ 
   pyarrow/table.pxi:3788: in pyarrow.lib.Table.from_pandas
       return cls.from_arrays(arrays, schema=schema)
   pyarrow/table.pxi:3890: in pyarrow.lib.Table.from_arrays
       result.validate()
   pyarrow/table.pxi:3170: in pyarrow.lib.Table.validate
       check_status(self.table.Validate())
   
   # ...
   
   FAILED pyarrow/tests/test_pandas.py::test_nested_chunking_valid - 
pyarrow.lib.ArrowInvalid: Column 0: In chunk 0: Invalid: List child array 
invalid: Invalid: Struct child array #0 has length smaller than expected for 
struct array (2 < 3)
   ```
   
   NOTE: This unit test uses about 7GB of memory (max RSS) on my macbook pro. 
This might make CI challenging; I'm open to suggestions to limit it.
   
   ### Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to