raulcd commented on PR #48963:
URL: https://github.com/apache/arrow/pull/48963#issuecomment-3871235959

   Thanks for your comment and thanks for sharing the details about AI usage.
   
   The functionality you are proposing can potentially make sense, even though 
it could require non-zero copy same as with concat batches. As you suggest the 
best place for a functionality like this would be next to `concat_batches` in 
`table.pxi` for the Python bindings implementation:
   
https://github.com/apache/arrow/blob/a82edf90ce66eb9a9a9e3bbac514e5d51f531c1f/python/pyarrow/table.pxi#L6297
   
   There's part of this functionality supported today something like:
   ```
     // Zero-copy: wraps batches as ChunkedArrays
     auto table = Table::FromRecordBatches(batches);
   
     // Read out in new chunk sizes
     TableBatchReader reader(*table);
     reader.set_chunksize(desired_row_count);
   ```
   The reader will return record batches of the desired row count size but 
might require concatenating more than one original RecordBatch, depending on 
the size, which would require copying on a single contiguous memory buffer.
   
   As per the bytes side, I don't think there's functionality to support that 
today on the C++ side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to