zhuqi-lucas commented on PR #7860: URL: https://github.com/apache/arrow-rs/pull/7860#issuecomment-3146386048
> Thank you @zhuqi-lucas -- I think this is quite a clever PR and the benchmark results are very nice > > I apologize for the delay in reviewing > > I left some specific questions and comments, but I think the only thing that is needed is some more testing. Specifically, since this code is special casing the first four bytes I think we should have some tests that sort and verify strings like > > ``` > "a" > "ab" > "ba" > "baa" > "abba" > "abbc" > "abc" > "cda" > etc > ``` > > In addition to some targeted testing, I think we should also consider some fuzz testing > > 1. make a bunch of random utf8 strings, including many that are short > 2. Sort the strings via `Vec::sort` > 3. Sort them via the sort kernel > 4. Verify the results are the same Thank you @alamb for review and good suggestions, i will add fuzz testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
