[ 
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-7052:
----------------------------------
    Attachment: LUCENE-7052-cleanup1.patch

Hi Mike,
I know originally we added the different comparators to be able to allow the 
index term dict to be sorted in different order. This never prooved to be 
useful, as many Lucene queries rely on the default order. The only codec that 
used another byte order internally was the Lucene 3 one (but it used the 
unicode spaghetti algorithm to reorder its term enums at runtime). As this is 
now all gone, I'd suggest to also remove the utf8AsUtf16 comparator. Mabye 
remove the comparators at all and just implement BytesRef.compareTo() and use 
that one for sorting?

I checked the code: utf8SortedAsUTF16SortOrder is only used in TSTLookup 
nowhere else anymore (except some test that check alternative sorts - those can 
be removed).

As a first step I changed the BytesRef code to no longer use inner classes and 
instead use a lambda to define the comparators. But I'd suggest to remove at 
least the UTF-16 one completely and move it as private impl detail and move it 
hidden TSTLookup (as only used there).

_FYI: The lambda has no speed impact because it is called only once and 
internally compiles to a class file that implements Comparator. It just looks 
nicer than the horrible comparator classes_

> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7052
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.0
>
>         Attachments: LUCENE-7052-cleanup1.patch, LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass 
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to