rustyconover opened a new issue, #49285:
URL: https://github.com/apache/arrow/issues/49285

   ### Describe the enhancement requested
   
   When serializing many `RecordBatch` objects in a hot loop (e.g. streaming 
IPC to a socket, writing to shared memory), `RecordBatch.serialize()` allocates 
a new buffer on every call. This creates unnecessary allocation pressure when 
the caller already knows the required size and could reuse a single buffer 
across calls. 
    
   It would be useful if `serialize()` accepted an optional `buffer` parameter 
so callers can provide a pre-allocated mutable buffer to serialize into 
directly.
   
   ## Example usage
   
   ```python
   import pyarrow as pa
   
   batches = [...] # many RecordBatches with the same schema
   
   # Pre-allocate once
   size = max(pa.ipc.get_record_batch_size(b) for b in batches)
   buf = pa.allocate_buffer(size)
   
   for batch in batches:
     result = batch.serialize(buffer=buf)
     send_over_network(result) # result is a zero-copy slice of buf
   ```
   
   New behavior
   
   - `batch.serialize(buffer=buf)` serializes directly into the provided buffer 
and returns a slice of it with the exact serialized size.
   - If the buffer is too small, a `ValueError` is raised with a message 
indicating the required vs. available size.
   - If the buffer is not mutable, a `ValueError` is raised.
   - When buffer is not provided, behavior is unchanged from today.
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to