rustyconover opened a new pull request, #49286: URL: https://github.com/apache/arrow/pull/49286
### Rationale for this change Addresses https://github.com/apache/arrow/issues/49285 When serializing many `RecordBatch` objects in a hot loop (e.g. streaming IPC to a socket, writing to shared memory), `RecordBatch.serialize()` allocates a new buffer on every call. This creates unnecessary allocation pressure when the caller already knows the required size and could reuse a single buffer across calls. ### What changes are included in this PR? Add an optional `buffer` parameter to `RecordBatch.serialize()`. When provided, the method serializes directly into the pre-allocated buffer instead of allocating a new one, and returns a zero-copy slice with the exact serialized size. Changes: - **`pyarrow/includes/libarrow.pxd`** — Add Cython declarations for `GetRecordBatchSize(batch, options, &size)` and `SerializeRecordBatch(batch, options, out)` overloads - **`pyarrow/table.pxi`** — Add `buffer` parameter to `RecordBatch.serialize()` with size validation and mutability checks - **`pyarrow/tests/test_ipc.py`** — Add tests for round-trip correctness, oversized buffers, exact-size buffers, too-small buffers, and immutable buffers ### Are these changes tested? Yes. New test `test_serialize_record_batch_to_buffer` covers all cases. ### Are there any user-facing changes? Yes. `RecordBatch.serialize()` now accepts an optional `buffer` keyword argument. * GitHub Issue: #49285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
