drin commented on code in PR #14008: URL: https://github.com/apache/arrow/pull/14008#discussion_r958951760
########## docs/source/cpp/tables.rst: ########## @@ -77,6 +77,17 @@ has a schema which must match its arrays' datatypes. Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental. +.. image:: tables-versus-record-batches.svg + :alt: A graphical representation of an Arrow Table and a Record Batch, with + structure as described in text above. + +Because record batches can be represented as a struct array, they can be +exported through the C data interface between implementations. Tables and +chunked arrays, on the other hand, are concepts in the C++ implementation, not +in the Arrow format itself, so they aren't directly portable. + +However, a table can be converted to and built from a sequence of record +batches easily without needing to copy the underlying array buffers. Review Comment: If the columns have different chunking, wouldn't it be necessary to normalize when converting to a RecordBatch? That being said, a Table is basically a structural view of a RecordBatch: each RecordBatch Array can be wrapped in a ChunkedArray trivially because the buffers are separate and a ChunkedArray is *mostly* an ArrayVector (though, I assume not entirely). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
