raulcd commented on PR #48963: URL: https://github.com/apache/arrow/pull/48963#issuecomment-3871235959
Thanks for your comment and thanks for sharing the details about AI usage. The functionality you are proposing can potentially make sense, even though it could require non-zero copy same as with concat batches. As you suggest the best place for a functionality like this would be next to `concat_batches` in `table.pxi` for the Python bindings implementation: https://github.com/apache/arrow/blob/a82edf90ce66eb9a9a9e3bbac514e5d51f531c1f/python/pyarrow/table.pxi#L6297 There's part of this functionality supported today something like: ``` // Zero-copy: wraps batches as ChunkedArrays auto table = Table::FromRecordBatches(batches); // Read out in new chunk sizes TableBatchReader reader(*table); reader.set_chunksize(desired_row_count); ``` The reader will return record batches of the desired row count size but might require concatenating more than one original RecordBatch, depending on the size, which would require copying on a single contiguous memory buffer. As per the bytes side, I don't think there's functionality to support that today on the C++ side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
