sergun commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1829720950

   > If you can work with record batches I would suggest using 
`to_struct_array()` method:
   > 
   > ```python
   > import pyarrow as pa
   > batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [3, 4, 5]})
   > struct_array = batch.to_struct_array()
   > batch_result = pa.RecordBatch.from_arrays([struct_array], names=["c"])
   > # pyarrow.RecordBatch
   > # c: struct<a: int64, b: int64>
   > #   child 0, a: int64
   > #   child 1, b: int64
   > # ----
   > # c: -- is_valid: all not null
   > # -- child 0 type: int64
   > # [1,2,3]
   > # -- child 1 type: int64
   > # [3,4,5]
   > ```
   > 
   > If you need to work with tables then you can do the same for each 
individual chunk:
   > 
   > ```python
   > # I think this should work
   > table = pa.table({"a": [1, 2, 3], "b": [3, 4, 5]})
   > batches = []
   > for b in table.to_batches():
   >     batches.append(pa.RecordBatch.from_arrays([b.to_struct_array()], 
names=["c"]))
   > table_result = pa.Table.from_batches(batches)
   > # pyarrow.Table
   > # c: struct<a: int64, b: int64>
   > #   child 0, a: int64
   > #   child 1, b: int64
   > # ----
   > # c: [
   > #   -- is_valid: all not null
   > #   -- child 0 type: int64
   > # [1,2,3]
   > #   -- child 1 type: int64
   > # [3,4,5]]
   > ```
   
   Thanks a lot @AlenkaF !
   
   Am I right such transformations Table <-> Batches cost close to zero 
according:
   https://arrow.apache.org/docs/cpp/tables.html#record-batches
   ?
   
   "However, a table can be converted to and built from a sequence of record 
batches easily without needing to copy the underlying array buffers. A table 
can be streamed as an arbitrary number of record batches using a 
[arrow::TableBatchReader](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow16TableBatchReaderE).
 Conversely, a logical sequence of record batches can be assembled to form a 
table using one of the 
[arrow::Table::FromRecordBatches()](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow5Table17FromRecordBatchesERKNSt6vectorINSt10shared_ptrI11RecordBatchEEEE)
 factory function overloads."
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to