Hmm... I think one of the biggest places where we do sorting is in
BytesRefHash#sort. Basically, we use it to collect + dedupe terms during
indexing, but need to sort the terms when flushing so we can build the FST
for the term dictionary, which requires sorted input. Feng Guo got a really
neat speedup in 2023 by caching the buckets used for the radix sort:
https://github.com/apache/lucene/pull/12784.

I feel like that might be a good place to try out orasort.

On Fri, Jan 30, 2026 at 9:57 AM Atri Sharma <[email protected]> wrote:

> Interesting. We rely heavily on MSBRadixSorter right now for exactly this
> reason (common prefixes in ID fields, etc).
>
> The usual killer for these adaptive sorts in Java is the object
> overhead/indirection. If we have to dereference the BytesRef to check the
> prefix, we blow the CPU cache, which negates the algorithmic gain seen
> generally.
>
> That said, if it minimizes compare() calls strictly enough, it might still
> win for BytesRefArray. Worth a quick JMH bench against the current
> StringSorter if someone has cycles.
>
>
> On Fri, 30 Jan 2026 at 11:13 PM, Benjamin Trent <[email protected]>
> wrote:
>
>> Hey y'all,
>>
>> This is a little out of the ordinary, but it was pointed out to me that
>> Oracle's "orasort" has fallen out of patent and is now usable.
>>
>> The original claim is that this adaptive common prefix sorting is much
>> faster than typical quick or radix sort.
>>
>>
>> https://smalldatum.blogspot.com/2026/01/common-prefix-skipping-adaptive-sort.html
>>
>>
>> While I have been working a while in Lucene, I am still pretty ignorant
>> about large portions of the code base (if it ain't vectors, I likely
>> haven't touched it much...). So, wondering if others had ideas if this
>> could actually be used?
>>
>> Here is a Golang impl: https://github.com/mattn/go-orasort
>>
>> Thanks!
>>
>> Ben
>>
>

Reply via email to