The Lucene PMC is pleased to announce the release of Apache Lucene 10.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search across high-dimensionality vectors, spell correction or query suggestions.
This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: https://lucene.apache.org/core/downloads.html Lucene 10.4 brings significant performance improvements and a new vector format. Many lucene queries should see a performance improvement of 10-15%, some might even see a 35% improvement! This is due to increasing the block size of the terms postings and better utilization of SIMD optimized code. Additionally, there is a new scalar quantized format for dense vectors and knn search. Lucene104ScalarQuantizedVectorsFormat and Lucene104HnswScalarQuantizedVectorsFormat allow custom quantized bits for 1, 2, 4, 7, and 8. The recall has improved significantly and for many vector types, quantizing to 2 bits will achieve even better recall than older formats at the 4 bit level. This improves latency while increasing recall for various vector work loads. *New Features* - Provides new and improved scalar quantized Lucene104ScalarQuantizedVectorsFormat and Lucene104HnswScalarQuantizedVectorsFormat for dense vectors. Allowing for quantizing to 1, 2, 4, 7, and 8 bits. For reference, the new 2 bit quantization technique provides better recall and speed than the old 4 bit. *API Changes* - New bulk operation APIs for dense vectors and numeric doc values *Improvements and Optimizations* - HNSW graphs can now delay being built for tiny segments and will prevent completely rebuilding the graphs when handling deletes - Handling of deletes in general got much faster and cheaper, improving storage costs significantly when there are very few deleted docs - Block size increased for terms postings, significantly improving query latency for many types of queries - Use a coarser-grained competitive iterator with lower construction costs for numeric sorts against fields with DocValuesSkippers. *Runtime Behavior Changes and Bug Fixes* - Fix tessellator failure by preferring the shared vertex that is the leftmost vertex of the hole - The `reverse` field of SortField is now final. If you have subclassed SortField, you should set `reverse` in the super constructor. - Align float vectors on disk to 64 bytes, for optimal performance on Arm Neoverse machines Please read CHANGES.txt for a full list of new features and changes: https://lucene.apache.org/core/10_4_0/changes/Changes.html
