kylebarron commented on PR #35780:
URL: https://github.com/apache/arrow/pull/35780#issuecomment-1839842156

   But then you call
   
   ```ts
   table.batches[0].numRows
   // 1
   ```
   
   Implicitly, a batch has a constant number of rows across all `Data` 
instances.
   
   When I test it in Python, surprisingly the `Table` constructor _allows_ it, 
but then when you call `to_batches` it reaggregates chunks into consistent 
record batches:
   
   ```py
   import pyarrow as pa
   
   d1 = [
       pa.array([1, 2], pa.int32()),
       pa.array([3], pa.int32()),
   ]
   d2 = [
       pa.array([4], pa.int32()),
       pa.array([5, 6], pa.int32()),
   ]
   
   c1 = pa.chunked_array(d1)
   c2 = pa.chunked_array(d2)
   table = pa.table({"v1": c1, "v2": c2})
   table
   # pyarrow.Table
   # v1: int32
   # v2: int32
   # ----
   # v1: [[1,2],[3]]
   # v2: [[4],[5,6]]
   
   # to_batches regroups into batches of length 1
   table.to_batches()
   # [pyarrow.RecordBatch
   #  v1: int32
   #  v2: int32
   #  ----
   #  v1: [1]
   #  v2: [4],
   #  pyarrow.RecordBatch
   #  v1: int32
   #  v2: int32
   #  ----
   #  v1: [2]
   #  v2: [5],
   #  pyarrow.RecordBatch
   #  v1: int32
   #  v2: int32
   #  ----
   #  v1: [3]
   #  v2: [6]]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to