pitrou commented on issue #44164:
URL: https://github.com/apache/arrow/issues/44164#issuecomment-2361303864

   > Previous bug reports with the offset overflow are mostly around very large 
strings. In this case, we don't have any individual string that is larger than 
2GB. Instead, we get the error when we are above a certain total size.
   
   This is expected anyway. The binary and string types in Arrow store the 
_offsets_ inside the data, so a string array with a total size greater than 2 
GiB is not possible.
   
   You should either keep the chunks separate (i.e. don't call 
`combine_chunks`) or first convert them your string column to large_string 
(which uses 64-bit offsets).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to