pitrou commented on issue #46814:
URL: https://github.com/apache/arrow/issues/46814#issuecomment-3345763763

   Ok, it looks like all chunks of the group key column (`uri`) are actually 
slices of a single physical String array. The problem is then the 32-bit 
offsets of that String array become negative at some point, because the 
underlying data is too large:
   ```python
   >>> binary_offsets[34170796-5:34170796+5]
   array([ 2147483331,  2147483401,  2147483471,  2147483541,  2147483611,
          -2147483615, -2147483545, -2147483475, -2147483405, -2147483335],
         dtype=int32)
   ```
   
   @zanmato1984 Is this a well-known bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to