[
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-7052:
----------------------------------
Attachment: LUCENE-7052-cleanup1.patch
Hi Mike,
I know originally we added the different comparators to be able to allow the
index term dict to be sorted in different order. This never prooved to be
useful, as many Lucene queries rely on the default order. The only codec that
used another byte order internally was the Lucene 3 one (but it used the
unicode spaghetti algorithm to reorder its term enums at runtime). As this is
now all gone, I'd suggest to also remove the utf8AsUtf16 comparator. Mabye
remove the comparators at all and just implement BytesRef.compareTo() and use
that one for sorting?
I checked the code: utf8SortedAsUTF16SortOrder is only used in TSTLookup
nowhere else anymore (except some test that check alternative sorts - those can
be removed).
As a first step I changed the BytesRef code to no longer use inner classes and
instead use a lambda to define the comparators. But I'd suggest to remove at
least the UTF-16 one completely and move it as private impl detail and move it
hidden TSTLookup (as only used there).
_FYI: The lambda has no speed impact because it is called only once and
internally compiles to a class file that implements Comparator. It just looks
nicer than the horrible comparator classes_
> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
> Key: LUCENE-7052
> URL: https://issues.apache.org/jira/browse/LUCENE-7052
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master, 6.0
>
> Attachments: LUCENE-7052-cleanup1.patch, LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]