zhuqi-lucas commented on PR #7860:
URL: https://github.com/apache/arrow-rs/pull/7860#issuecomment-3146386048

   > Thank you @zhuqi-lucas -- I think this is quite a clever PR and the 
benchmark results are very nice
   > 
   > I apologize for the delay in reviewing
   > 
   > I left some specific questions and comments, but I think the only thing 
that is needed is some more testing. Specifically, since this code is special 
casing the first four bytes I think we should have some tests that sort and 
verify strings like
   > 
   > ```
   > "a"
   > "ab"
   > "ba"
   > "baa"
   > "abba"
   > "abbc"
   > "abc"
   > "cda"
   > etc
   > ```
   > 
   > In addition to some targeted testing, I think we should also consider some 
fuzz testing
   > 
   > 1. make a bunch of random utf8 strings, including many that are short
   > 2. Sort the strings via `Vec::sort`
   > 3. Sort them via the sort kernel
   > 4. Verify the results are the same
   
   Thank you @alamb for review and good suggestions, i will add fuzz testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to