HaoYang670 commented on code in PR #1633:
URL: https://github.com/apache/arrow-rs/pull/1633#discussion_r862620323
##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -86,6 +86,52 @@ fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
Ok(make_array(data))
}
+fn fixed_size_binary_substring(
+ array: &FixedSizeBinaryArray,
+ old_len: i32,
+ start: i32,
+ length: Option<i32>,
+) -> Result<ArrayRef> {
+ let new_start = if start >= 0 {
+ start.min(old_len)
+ } else {
+ (old_len + start).max(0)
+ };
+ let new_len = match length {
+ Some(len) => len.min(old_len - new_start),
+ None => old_len - new_start,
+ };
Review Comment:
I think the answer is no for several reasons:
1. the definition of the public function makes sure that `length` is always
>=0
```rust
pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) ->
Result<ArrayRef> {
```
2. the definition of the data type of `FixedSizeBinary` seems to allow
negative value, although it is somewhat weird:
```rust
/// Opaque binary data of fixed size.
/// Enum parameter specifies the number of bytes per value.
FixedSizeBinary(i32),
```
Although it may be more reasonable to use unsigned int to type the `length`,
in Apache Arrow specification, the `length` must be `i32`.
https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout (For
nested arrays, we always use signed integer to represent the length or offsets.)
```
A fixed size list type is specified like FixedSizeList<T>[N], where T is any
type (primitive or nested) and N is a 32-bit signed integer representing the
length of the lists.
```
What would happen if we give a negative length or negative offsets buffer?
This is a fun game!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]