Hmm... I think one of the biggest places where we do sorting is in BytesRefHash#sort. Basically, we use it to collect + dedupe terms during indexing, but need to sort the terms when flushing so we can build the FST for the term dictionary, which requires sorted input. Feng Guo got a really neat speedup in 2023 by caching the buckets used for the radix sort: https://github.com/apache/lucene/pull/12784.
I feel like that might be a good place to try out orasort. On Fri, Jan 30, 2026 at 9:57 AM Atri Sharma <[email protected]> wrote: > Interesting. We rely heavily on MSBRadixSorter right now for exactly this > reason (common prefixes in ID fields, etc). > > The usual killer for these adaptive sorts in Java is the object > overhead/indirection. If we have to dereference the BytesRef to check the > prefix, we blow the CPU cache, which negates the algorithmic gain seen > generally. > > That said, if it minimizes compare() calls strictly enough, it might still > win for BytesRefArray. Worth a quick JMH bench against the current > StringSorter if someone has cycles. > > > On Fri, 30 Jan 2026 at 11:13 PM, Benjamin Trent <[email protected]> > wrote: > >> Hey y'all, >> >> This is a little out of the ordinary, but it was pointed out to me that >> Oracle's "orasort" has fallen out of patent and is now usable. >> >> The original claim is that this adaptive common prefix sorting is much >> faster than typical quick or radix sort. >> >> >> https://smalldatum.blogspot.com/2026/01/common-prefix-skipping-adaptive-sort.html >> >> >> While I have been working a while in Lucene, I am still pretty ignorant >> about large portions of the code base (if it ain't vectors, I likely >> haven't touched it much...). So, wondering if others had ideas if this >> could actually be used? >> >> Here is a Golang impl: https://github.com/mattn/go-orasort >> >> Thanks! >> >> Ben >> >
