AlenkaF commented on issue #36388: URL: https://github.com/apache/arrow/issues/36388#issuecomment-1623362606
Oh sorry, the last link to the C++ is wrong. I meant to add this: https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L851 After talking to @jorisvandenbossche about this issue I would like to add a proposed fix for it in the C++ here (contributions welcome). The _Negative offsets in binary array_ message is coming from `CreateOffsetsBuffer`: https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L808-L817 where the `value_length` * (the number of repetitions) exceeds the `int64` limit. We could add a check for the overflow in the `RepeatedArrayFactory` for binary type: https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/cpp/src/arrow/array/util.cc#L638-L649 using `MultiplyWithOverflow`, something similar to what we do here: https://github.com/apache/arrow/blob/fb8760e4d749c718d95fa784600f51e8b6fd2f43/python/pyarrow/src/arrow/python/datetime.h#L162-L163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
