amoeba commented on issue #36295:
URL: https://github.com/apache/arrow/issues/36295#issuecomment-2374573245

   Hey @zanmato1984, I think you're right that this is now fixed on main. I can 
reproduce the issue with PyArrow 17 but with PyArrow built from main I no 
longer get the data corruption.
   
   ```
   >>> pyarrow.__version__
   '18.0.0.dev372+gfc2e018ef1.d20240925'
   ```
   
   And when I run the script with `COLUMN_COUNT = 5`:
   
   ```
   ❯ python repro.py
   -------------------- ORIGINAL --------------------
   pyarrow.Table
   index: uint64
   0: uint64
   1: uint64
   2: uint64
   3: uint64
   4: uint64
   ----
   index: [[0,1,2,3,4,...,99999995,99999996,99999997,99999998,99999999]]
   0: [[0,0,0,0,0,...,0,0,0,0,0]]
   1: [[1,1,1,1,1,...,1,1,1,1,1]]
   2: [[2,2,2,2,2,...,2,2,2,2,2]]
   3: [[3,3,3,3,3,...,3,3,3,3,3]]
   4: [[4,4,4,4,4,...,4,4,4,4,4]]
   -------------- GROUP_BY / AGGREGATE --------------
   pyarrow.Table
   index: uint64
   0: uint64
   1: uint64
   2: uint64
   3: uint64
   4: uint64
   ----
   index: [[0,1,2,3,4,...,98566139,98566140,98566141,98566142,98566143]]
   0: [[0,0,0,0,0,...,0,0,0,0,0]]
   1: [[1,1,1,1,1,...,1,1,1,1,1]]
   2: [[2,2,2,2,2,...,2,2,2,2,2]]
   3: [[3,3,3,3,3,...,3,3,3,3,3]]
   4: [[4,4,4,4,4,...,4,4,4,4,4]]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to