kylebarron commented on PR #8790: URL: https://github.com/apache/arrow-rs/pull/8790#issuecomment-3497654509
> one has to do slight workarounds to use them: I think that's outdated for Python -> Rust. I haven't tried but you should be able to pass a `pyarrow.Table` directly into an `ArrowArrayStreamReader` on the Rust side, because it just looks for the `__arrow_c_stream__` method that exists either on the `Table` or the `pyarrow.RecordBatchReader`. But I assume there's no way today to easily return a `Table` from Rust to Python. > At least I personally think having such a wrapper could be nice, since it simplifies stuff a bit when you anyways already have `Vec<RecordBatch>` on the Rust side somewhere or need to handle a `pyarrow.Table` on the Python side and want to have an easy method to generate such a thing from Rust. I'm fine with that; and I think other maintainers would probably be fine with that too, since it's only a concept that exists in the Python integration. I'm not sure I totally get your example. Seems bad to be returning a union of multiple types to Python. But seems reasonable to return a `Table` there. The alternative is to return a stream and have the user either iterate over it lazily or choose to materialize it with `pa.table(ParquetFile.read_row_group(...))`. > And just for clarity, we unfortunately _need_ to have the entire Row group deserialized as Python objects because our data ingestion pipelines that consume this are expecting to have access to the entire row group in bulk, so streaming approaches are sadly not usable. Well there's nothing stopping you from materializing the stream by passing it to `pa.table()`. You don't have to use the stream as a stream. > Yes, in general, I much prefer the approach of `arro3` to be totally `pyarrow` agnostic. In our case unfortunately, we're right now still pretty hardcoded against `pyarrow` specifics and just use `arrow-rs` as a means to reduce memory load compared to reading & writing parquet datasets with `pyarrow` directly. You can use `pyo3-arrow` with `pyarrow` as well, but I'm not opposed to adding this functionality to arrow-rs as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
