[GitHub] [druid] gianm opened a new pull request, #13364: Add various string comparison methods to StringUtils, to be used later.

GitBox Mon, 14 Nov 2022 13:50:55 -0800


gianm opened a new pull request, #13364:
URL: https://github.com/apache/druid/pull/13364


   There are various places in Druid code where we assume that String.compareTo
   is consistent with Unicode code-point ordering. Sadly this is not the case.
   
   To help deal with this, this patch introduces the following helpers:
   
   1) compareUnicode: Compares two Strings in Unicode code-point order.
   2) compareUtf8: Compares two UTF-8 byte arrays in Unicode code-point order.
      Equivalent to comparison as unsigned bytes.
   3) compareUtf8UsingJavaStringOrdering: Compares two UTF-8 byte arrays, or
      ByteBuffers, in a manner consistent with String.compareTo.
   
   There is no helper for comparing two Strings in a manner consistent
   with String.compareTo, because for that we can use compareTo directly.
   
   The patch also fixes an inconsistency between the String and UTF-8
   dictionary GenericIndexed flavors of string-typed columns: they were
   formerly using incompatible comparators.
   
   There may be other inconsistencies to address in future patches. One area
   I can think of is MSQ frames, which use Unicode code-point order for sorting
   Strings. I think this is harmless in most cases, but can be a problem when 
MSQ
   sort order interfaces with some other code expecting a different order. For
   example, in the range shard specs that MSQ generates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm opened a new pull request, #13364: Add various string comparison methods to StringUtils, to be used later.

Reply via email to