sergun commented on issue #38912:
URL: https://github.com/apache/arrow/issues/38912#issuecomment-1829720950
> If you can work with record batches I would suggest using
`to_struct_array()` method:
>
> ```python
> import pyarrow as pa
> batch = pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": [3, 4, 5]})
> struct_array = batch.to_struct_array()
> batch_result = pa.RecordBatch.from_arrays([struct_array], names=["c"])
> # pyarrow.RecordBatch
> # c: struct<a: int64, b: int64>
> # child 0, a: int64
> # child 1, b: int64
> # ----
> # c: -- is_valid: all not null
> # -- child 0 type: int64
> # [1,2,3]
> # -- child 1 type: int64
> # [3,4,5]
> ```
>
> If you need to work with tables then you can do the same for each
individual chunk:
>
> ```python
> # I think this should work
> table = pa.table({"a": [1, 2, 3], "b": [3, 4, 5]})
> batches = []
> for b in table.to_batches():
> batches.append(pa.RecordBatch.from_arrays([b.to_struct_array()],
names=["c"]))
> table_result = pa.Table.from_batches(batches)
> # pyarrow.Table
> # c: struct<a: int64, b: int64>
> # child 0, a: int64
> # child 1, b: int64
> # ----
> # c: [
> # -- is_valid: all not null
> # -- child 0 type: int64
> # [1,2,3]
> # -- child 1 type: int64
> # [3,4,5]]
> ```
Thanks a lot @AlenkaF !
Am I right such transformations Table <-> Batches cost close to zero
according:
https://arrow.apache.org/docs/cpp/tables.html#record-batches
?
"However, a table can be converted to and built from a sequence of record
batches easily without needing to copy the underlying array buffers. A table
can be streamed as an arbitrary number of record batches using a
[arrow::TableBatchReader](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow16TableBatchReaderE).
Conversely, a logical sequence of record batches can be assembled to form a
table using one of the
[arrow::Table::FromRecordBatches()](https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow5Table17FromRecordBatchesERKNSt6vectorINSt10shared_ptrI11RecordBatchEEEE)
factory function overloads."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]