rustyconover opened a new pull request, #49286:
URL: https://github.com/apache/arrow/pull/49286

   ### Rationale for this change
   
   Addresses https://github.com/apache/arrow/issues/49285
   
   When serializing many `RecordBatch` objects in a hot loop (e.g. streaming 
IPC to a socket, writing to shared memory), `RecordBatch.serialize()` allocates 
a new buffer on every call. This creates unnecessary allocation pressure when 
the caller already knows the required size and could reuse a single buffer 
across calls.
   
   ### What changes are included in this PR?
   
   Add an optional `buffer` parameter to `RecordBatch.serialize()`. When 
provided, the method serializes directly into the pre-allocated buffer instead 
of allocating a new one, and returns a zero-copy slice with the exact 
serialized size.
   
   Changes:
   - **`pyarrow/includes/libarrow.pxd`** — Add Cython declarations for 
`GetRecordBatchSize(batch, options, &size)` and `SerializeRecordBatch(batch, 
options, out)` overloads
   - **`pyarrow/table.pxi`** — Add `buffer` parameter to 
`RecordBatch.serialize()` with size validation and mutability checks
   - **`pyarrow/tests/test_ipc.py`** — Add tests for round-trip correctness, 
oversized buffers, exact-size buffers, too-small buffers, and immutable buffers
   
   ### Are these changes tested?
   
   Yes. New test `test_serialize_record_batch_to_buffer` covers all cases.
   
   ### Are there any user-facing changes?
   
   Yes. `RecordBatch.serialize()` now accepts an optional `buffer` keyword 
argument.
   
   * GitHub Issue: #49285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to