jonded94 commented on PR #8790:
URL: https://github.com/apache/arrow-rs/pull/8790#issuecomment-3536513427

   Hey, I pushed a version that actually does not use the PyCapsule ArrayStream 
interface for converting a `pyarrow.Table` to `Table`, if the given python 
object has a `to_batches()` method (which the `pyarrow.Table` does). This is 
not necessarily intended to stay that way, but this is helpful for diagnosing 
where RecordBatch metadata is dropped.
   
   `pyarrow.Table.to_batches()` returns a `list[pyarrow.RecordBatch]` which I 
explicitly convert to `Vec<RecordBatch>` in the `from_pyarrow_bound` function 
of `impl FromPyArrow for Table`. This basically is the equivalent of what I'm 
doing in the corresponding `impl IntoPyArrow for Table`, as I'm not using the 
PyCapsule interface there, but just immediately construct a `pyarrow.Table` out 
of `Vec<RecordBatch>` through `pyarrow.Table.from_batches(...)`.
   
   With that, I got RecordBatches with preserved metadata from `pyarrow.Table`, 
in turn allowing me to drop the `schema_equals` function but instead do a full 
`schema == record_batch.schema()` check.
   
   Since I also checked on the Python side with a 
`pyarrow.RecordBatchReader.from_stream` of a `StreamWrapper` around a 
`pyarrow.Table` that RecordBatches from a ArrayStream PyCapsule interface of a 
`pyarrow.Table` definitely still have their metadata, the error has to be on 
the Rust side somewhere in the `Box<dyn RecordBatchReader>` / `impl FromPyArrow 
for ArrowArrayStreamReader` method.
   
   Potentially there is a slight misuse of the PyCapsule interface somewhere, 
as this definitely seems to return RecordBatches without metadata. I'm not too 
familiar with the low-level stuff there, but I'll try to investigate; help is 
appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to