[PR] test(arrow-ord): remove duplicate logic and make it easier to add tests [arrow-rs]

via GitHub Thu, 26 Dec 2024 09:47:05 -0800


rluvaton opened a new pull request, #6913:
URL: https://github.com/apache/arrow-rs/pull/6913


   # Rationale for this change
    
   While working on #6911 and I started to write tests, I found out I just 
duplicated the logic, and it was messier, so cleaning up to make it easier to 
create new tests.
   
   **Background:** all sort arrays test helper were given 2 vectors with native 
rust types (e.g. `Vec<Option<bool>>`), one for the input and the other for the 
expected output.
   
   and the sort test helper convert it to the input and expected arrays to 
Arrow arrays (e.g. `BooleanArray`), and then call `sort` (or `sort_limit` 
depend on the input) and assert that the expected match the sorted.
   
   this is done to help make the test simpler by just providing what the actual 
arrow array represent without messing with converting to arrow type.
   
   the problem is that some Rust types can have multiple Arrow types, for 
example `Vec<Option<Vec<u8>>>` can be `BinaryArray` or `LargeBinaryArray` or 
`FixedSizeBinaryArray` or `BinaryViewArray` (same for lists and string).
   
   in that case each sort helper function convert the input data to some of the 
arrow type (like `FixedSizeBinaryArray`) or to multiple arrow type 
(`BinaryArray` and `LargeBinaryArray`)
   
   and then do the same as described above.
   
   The problem with this is that there are a lot of duplication and adding 
tests to new type is just adding a lot of boilerplate instead of just starting 
to write code
   
   # What changes are included in this PR?
   
   So instead of this approach I created a single `test_sort_array` function 
that is given the `input`, `expected`  and a function to convert that rust 
input type (e.g.  `Vec<Option<Vec<u8>>>` like explained above) to all variants 
from that type (e.g. `BinaryArray` and `LargeBinaryArray` and 
`FixedSizeBinaryArray` and `BinaryViewArray`)
   
   then created for each test type a new helper to convert that rust type to 
arrow type
   
   and moved all the build array helpers to sub module for more organize code.
   
   
   But why do I pass a function to build the array and not using Rust trait 
system? this will surely be cleaner as you can have the actual function 
implementation hidden behind the input type and it would make the test cleaner.
   
   You (basically my other self) are right. The problem is that I tried and it 
can't work without implementing for each primitive type manually (I think), 
because let's see the following example:
   
   Let's say I have a trait called `ConvertToArray`
   
   ```rust
   trait ConvertToArrays {
        fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef>;
   }
   ```
   
   and then I implement it for primitive type to create primitive array:
   ```rust
   impl<T> ConvertToArrays for T::Native
        where T: ArrowPrimitiveType,
        PrimitiveArray<T>: From<Vec<Option<T::Native>>>
   {
        fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef> {
                vec![Arc::new(PrimitiveArray::<T>::from(data)) as ArrayRef]
        }
   } 
   ```
   
   all good and it works. the problem is I can't implement it for `bool` when I 
want to test sorting `BooleanArray` because `ArrowPrimitiveType` can be 
implemented in the future for `bool` and this would cause duplicate 
implementation. and for that reason Rust disallow that
   
   # Are there any user-facing changes?
   nope, only tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] test(arrow-ord): remove duplicate logic and make it easier to add tests [arrow-rs]

Reply via email to