[GitHub] [arrow] drin commented on a diff in pull request #14008: ARROW-13454: [C++][Docs] Tables vs Record Batches

GitBox Tue, 30 Aug 2022 14:38:12 -0700


drin commented on code in PR #14008:
URL: https://github.com/apache/arrow/pull/14008#discussion_r958951760



##########
docs/source/cpp/tables.rst:
##########
@@ -77,6 +77,17 @@ has a schema which must match its arrays' datatypes.
 Record batches are a convenient unit of work for various serialization
 and computation functions, possibly incremental.
 
+.. image:: tables-versus-record-batches.svg
+   :alt: A graphical representation of an Arrow Table and a Record Batch, with
+         structure as described in text above.
+
+Because record batches can be represented as a struct array, they can be 
+exported through the C data interface between implementations. Tables and 
+chunked arrays, on the other hand, are concepts in the C++ implementation, not 
+in the Arrow format itself, so they aren't directly portable.
+
+However, a table can be converted to and built from a sequence of record 
+batches easily without needing to copy the underlying array buffers.

Review Comment:
   If the columns have different chunking, wouldn't it be necessary to 
normalize when converting to a RecordBatch?
   
   That being said, a Table is basically a structural view of a RecordBatch: 
each RecordBatch Array can be wrapped in a ChunkedArray trivially because the 
buffers are separate and a ChunkedArray is *mostly* an ArrayVector (though, I 
assume not entirely).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] drin commented on a diff in pull request #14008: ARROW-13454: [C++][Docs] Tables vs Record Batches

Reply via email to