pitrou commented on issue #46814:
URL: https://github.com/apache/arrow/issues/46814#issuecomment-3345763763
Ok, it looks like all chunks of the group key column (`uri`) are actually
slices of a single physical String array. The problem is then the 32-bit
offsets of that String array become negative at some point, because the
underlying data is too large:
```python
>>> binary_offsets[34170796-5:34170796+5]
array([ 2147483331, 2147483401, 2147483471, 2147483541, 2147483611,
-2147483615, -2147483545, -2147483475, -2147483405, -2147483335],
dtype=int32)
```
@zanmato1984 Is this a well-known bug?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]