westonpace commented on issue #34639:
URL: https://github.com/apache/arrow/issues/34639#issuecomment-1536510963

   I've reopened this so we can verify but I think it is actually doing the 
right thing.  Although I think there is another bug in to_struct_array and 
to_pandas (:face_exhaling:)
   
   ```
   > pa.RecordBatch.from_struct_array(standard)
   ```
   
   This will give you a record batch that has length 1 with two child arrays 
that each have length 2.  This is allowed because it lets us use zero-copy.
   
   ```
   >>> x = pa.RecordBatch.from_struct_array(standard)
   >>> print(x) # Sadly, we don't print the contents here
   pyarrow.RecordBatch
   col1: double
   col2: string
   >>> print(x.num_rows) # This is correct
   1
   >>> print(x.column(0)) # This is arguably correct but misleading
   [
     1,
     2
   ]
   >>> print(x.to_pylist()) # This is correct
   [{'col1': 1.0, 'col2': 'a'}]
   >>> print(x.to_struct_array()) # This is wrong
   -- is_valid: all not null
   -- child 0 type: double
     [
       1,
       2
     ]
   -- child 1 type: string
     [
       "a",
       "b"
     ]
   >>> print(x.to_pandas()) # this is also wrong
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/array.pxi", line 852, in 
pyarrow.lib._PandasConvertible.to_pandas
     File "pyarrow/table.pxi", line 2506, in pyarrow.lib.RecordBatch._to_pandas
     File "pyarrow/table.pxi", line 4075, in pyarrow.lib.Table._to_pandas
     File "/home/pace/dev/arrow/python/pyarrow/pandas_compat.py", line 823, in 
table_to_blockmanager
       return BlockManager(blocks, axes)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/pace/miniconda3/envs/conbench3/lib/python3.11/site-packages/pandas/core/internals/managers.py",
 line 1040, in __init__
       self._verify_integrity()
     File 
"/home/pace/miniconda3/envs/conbench3/lib/python3.11/site-packages/pandas/core/internals/managers.py",
 line 1047, in _verify_integrity
       raise construction_error(tot_items, block.shape[1:], self.axes)
   ValueError: Shape of passed values is (2, 2), indices imply (1, 2)
   ```
   
   I will open up two new issues for to_struct_array and to_pandas.  Arguably, 
we should also modify `to_batches` to push "short lengths" into the arrays 
themselves.  I'll have to ask on the ML if it's legal for a record batch and 
its arrays to have different lengths.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to