zeroshade commented on issue #38458: URL: https://github.com/apache/arrow/issues/38458#issuecomment-1785476185
So, at minimum this should be fixed to be consistent between the `Binary` and `String` types, it was not intended to be inconsistent. That all said: I believe the intent of the `ValueOffset` method is to provide the starting offset of the value at index `i`, which means that the `Binary` type is correct in having the `>=` there, because there's no value at an index equal to the length of the slice/array to get the *starting* offset for. Instead you'd use `ValueLen`, or `ValueOffsets` and grab the last value or something like that. If you're trying to get the element size, there's already a `ValueLen` method which does that for you. If you're trying to get the actual value, the `Value` method returns a slice with the value without copying. So, I believe the optimal solution here would be to expose the `ValueLen` method in the `BinaryLike` interface so that you do not need to try to calculate these byte widths yourself manually and can instead just call that method. What do you both think? > I also find it unsettling that ValueOffset/ValueOffset64 take in int row index parameters whereas slicing methods take in int64 This was done for consistency as most of the existing row index params in the array types use `int` rather than an explicit `int64` (likely for convenience). In theory this would only affect someone if they were running on a 32-bit machine but had enough rows that it would not fit in an int32. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
