The Lucene PMC is pleased to announce the release of Apache Lucene 10.3.0.

Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structured search, full-text search, faceting,
nearest-neighbor search across high-dimensionality vectors, spell
correction or query suggestions.

This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:

  < https://lucene.apache.org/core/downloads.html >


Lucene 10.3 brings major performance improvements.

Lexical search is now vectorized to better take advantage of SIMD
instructions, more efficient memory access patterns, CPU pipelining and
amortize the cost of virtual function calls. Lucene's nightly benchmarks
report a 40% speedup compared with Lucene 10.2 when computing top-100 hits
by score on disjunctive and conjunctive queries.

Vector search now better parallelizes fetching vectors into the CPU cache.
Lucene's nightly benchmarks report a 15%-20% speedup compared with Lucene
10.2.

The terms dictionary performs about 30% faster than in Lucene 10.2 on
primary-key lookups according to Lucene's nightly benchmarks. This should
help speed up workloads that rely on terms dictionary lookup performance
including primary-key lookups, indexing operations that specify an ID and
TermInSet queries.

New Features

   - Supports reranking with late interaction model multi-vectors, full
   precision vector similarity scores, or any provided DoubleValuesSource,
   enabling improved ranking of search results.
   - Adds a MultiIndexMergeScheduler – a multi-tenant wrapper that allows
   sharing a common merge scheduler across multiple instances.

API Changes

   - Adds API to fetch the size of off-heap memory required by a KNN field.
   This size can be used to help determine the memory requirements for optimal
   search performance, which can be greatly affected by page faults when not
   enough memory is available.
   - RandomVectorScorer now supports a bulk scoring interface.
   - LeafReader#searchNearestVectors now accepts an AcceptDocs instance
   instead of a Bits instance to identify document IDs to filter.
   - Collectors can now take advantage of pre-aggregated data to speed up
   faceting using LeafCollector#collectRange.

Improvements and Optimizations

   - Adds optimistic knn search to vector queries. Optimistic knn search
   addresses a major issue where we return inconsistent results due to race
   conditions in the shared queue previously used over multi-segment search.
   - Faster vector search on HNSW graphs through GroupVarInt encoding.
   - Searcher managers now support 'Adaptive Refresh', enabling users to
   control the commit points it refreshes on. This helps with graceful
   handling of large replication payloads in segment-replicated systems.

Runtime Behavior Changes and Bug Fixes

   - The default ReadAdvice has been changed from RANDOM to NORMAL.
   MMapDirectory will no longer set any specific read advice out-of-the-box.
   - Default RefCountedSharedArena.DEFAULT_MAX_PERMITS are reduced to 64.
   Also fixes the infinite loop when RefCountedSharedArena's underlying
   Arena#close fails due to concurrent usage of segments.
   - Uses READONCE when reading segment infos, to fix mmap leaks on segment
   info files. Includes fixes for multiple other resource leaks.



Please read CHANGES.txt for a full list of new features and changes:
https://lucene.apache.org/core/10_3_0/changes/Changes.html

----

Reply via email to