pitrou commented on issue #44164: URL: https://github.com/apache/arrow/issues/44164#issuecomment-2361303864
> Previous bug reports with the offset overflow are mostly around very large strings. In this case, we don't have any individual string that is larger than 2GB. Instead, we get the error when we are above a certain total size. This is expected anyway. The binary and string types in Arrow store the _offsets_ inside the data, so a string array with a total size greater than 2 GiB is not possible. You should either keep the chunks separate (i.e. don't call `combine_chunks`) or first convert them your string column to large_string (which uses 64-bit offsets). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
