The Lucene PMC is pleased to announce the release of Apache Lucene 10.3.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search across high-dimensionality vectors, spell correction or query suggestions.
This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: < https://lucene.apache.org/core/downloads.html > Lucene 10.3 brings major performance improvements. Lexical search is now vectorized to better take advantage of SIMD instructions, more efficient memory access patterns, CPU pipelining and amortize the cost of virtual function calls. Lucene's nightly benchmarks report a 40% speedup compared with Lucene 10.2 when computing top-100 hits by score on disjunctive and conjunctive queries. Vector search now better parallelizes fetching vectors into the CPU cache. Lucene's nightly benchmarks report a 15%-20% speedup compared with Lucene 10.2. The terms dictionary performs about 30% faster than in Lucene 10.2 on primary-key lookups according to Lucene's nightly benchmarks. This should help speed up workloads that rely on terms dictionary lookup performance including primary-key lookups, indexing operations that specify an ID and TermInSet queries. New Features - Supports reranking with late interaction model multi-vectors, full precision vector similarity scores, or any provided DoubleValuesSource, enabling improved ranking of search results. - Adds a MultiIndexMergeScheduler – a multi-tenant wrapper that allows sharing a common merge scheduler across multiple instances. API Changes - Adds API to fetch the size of off-heap memory required by a KNN field. This size can be used to help determine the memory requirements for optimal search performance, which can be greatly affected by page faults when not enough memory is available. - RandomVectorScorer now supports a bulk scoring interface. - LeafReader#searchNearestVectors now accepts an AcceptDocs instance instead of a Bits instance to identify document IDs to filter. - Collectors can now take advantage of pre-aggregated data to speed up faceting using LeafCollector#collectRange. Improvements and Optimizations - Adds optimistic knn search to vector queries. Optimistic knn search addresses a major issue where we return inconsistent results due to race conditions in the shared queue previously used over multi-segment search. - Faster vector search on HNSW graphs through GroupVarInt encoding. - Searcher managers now support 'Adaptive Refresh', enabling users to control the commit points it refreshes on. This helps with graceful handling of large replication payloads in segment-replicated systems. Runtime Behavior Changes and Bug Fixes - The default ReadAdvice has been changed from RANDOM to NORMAL. MMapDirectory will no longer set any specific read advice out-of-the-box. - Default RefCountedSharedArena.DEFAULT_MAX_PERMITS are reduced to 64. Also fixes the infinite loop when RefCountedSharedArena's underlying Arena#close fails due to concurrent usage of segments. - Uses READONCE when reading segment infos, to fix mmap leaks on segment info files. Includes fixes for multiple other resource leaks. Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/core/10_3_0/changes/Changes.html ----