rluvaton opened a new pull request, #6913:
URL: https://github.com/apache/arrow-rs/pull/6913
# Rationale for this change
While working on #6911 and I started to write tests, I found out I just
duplicated the logic, and it was messier, so cleaning up to make it easier to
create new tests.
**Background:** all sort arrays test helper were given 2 vectors with native
rust types (e.g. `Vec<Option<bool>>`), one for the input and the other for the
expected output.
and the sort test helper convert it to the input and expected arrays to
Arrow arrays (e.g. `BooleanArray`), and then call `sort` (or `sort_limit`
depend on the input) and assert that the expected match the sorted.
this is done to help make the test simpler by just providing what the actual
arrow array represent without messing with converting to arrow type.
the problem is that some Rust types can have multiple Arrow types, for
example `Vec<Option<Vec<u8>>>` can be `BinaryArray` or `LargeBinaryArray` or
`FixedSizeBinaryArray` or `BinaryViewArray` (same for lists and string).
in that case each sort helper function convert the input data to some of the
arrow type (like `FixedSizeBinaryArray`) or to multiple arrow type
(`BinaryArray` and `LargeBinaryArray`)
and then do the same as described above.
The problem with this is that there are a lot of duplication and adding
tests to new type is just adding a lot of boilerplate instead of just starting
to write code
# What changes are included in this PR?
So instead of this approach I created a single `test_sort_array` function
that is given the `input`, `expected` and a function to convert that rust
input type (e.g. `Vec<Option<Vec<u8>>>` like explained above) to all variants
from that type (e.g. `BinaryArray` and `LargeBinaryArray` and
`FixedSizeBinaryArray` and `BinaryViewArray`)
then created for each test type a new helper to convert that rust type to
arrow type
and moved all the build array helpers to sub module for more organize code.
But why do I pass a function to build the array and not using Rust trait
system? this will surely be cleaner as you can have the actual function
implementation hidden behind the input type and it would make the test cleaner.
You (basically my other self) are right. The problem is that I tried and it
can't work without implementing for each primitive type manually (I think),
because let's see the following example:
Let's say I have a trait called `ConvertToArray`
```rust
trait ConvertToArrays {
fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef>;
}
```
and then I implement it for primitive type to create primitive array:
```rust
impl<T> ConvertToArrays for T::Native
where T: ArrowPrimitiveType,
PrimitiveArray<T>: From<Vec<Option<T::Native>>>
{
fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef> {
vec![Arc::new(PrimitiveArray::<T>::from(data)) as ArrayRef]
}
}
```
all good and it works. the problem is I can't implement it for `bool` when I
want to test sorting `BooleanArray` because `ArrowPrimitiveType` can be
implemented in the future for `bool` and this would cause duplicate
implementation. and for that reason Rust disallow that
# Are there any user-facing changes?
nope, only tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]