salvatorecampagna opened a new pull request, #15413: URL: https://github.com/apache/lucene/pull/15413
## Scope For segments with sparse deletions (<=1%), this change tracks only the deleted document IDs instead of maintaining a full bitset of all documents. This simple change reduces LiveDocs memory usage by up to 8x and speeds up deleted-document iteration by 3-4x in typical append-heavy workloads. ## The Problem Lucene currently allocates `maxDoc/8` bytes for LiveDocs, independent of the number of deletions. For example, a 10M-document segment always allocates ~1.2 MB even if only 100K documents (1%) are deleted, wasting memory on mostly live documents. This change stores only the deleted document IDs, reducing memory by up to 8x at a 1% deletion rate. The savings scale linearly: for example, a 100M-document segment with 1% deletions drops from ~12 MB to ~800 KB (random pattern). ## How It Works The implementation uses an adaptive approach: - **SparseLiveDocs** (<=1% deletions): Stores only deleted doc IDs using `SparseFixedBitSet` - **DenseLiveDocs** (>1% deletions): Uses traditional `FixedBitSet` for live docs This PR introduces two complementary implementations: `SparseLiveDocs` for low deletion rates and `DenseLiveDocs` for high deletion rates, each optimized for their respective cases. These implementations add efficient iteration methods via the `LiveDocs` interface: `deletedDocsIterator()` for O(deletedDocs) iteration and `liveDocsIterator()` for O(liveDocs) iteration. The `LiveDocs` interface approach allows the test framework to wrap these implementations with `AssertingLiveDocs`, which validates correctness during testing by delegating to the underlying `LiveDocs` methods while adding assertions. This preserved compatibility with existing test infrastructure without requiring changes to test code. The codec automatically selects the right format when reading `.liv` files from disk. Benchmarks show the crossover point, where sparse and dense performance equalize, occurs around 5-10% depending on deletion pattern. By choosing 1%, sparse provides clear wins in both memory and iteration speed across all patterns. Even in the worst-case (uniform) distribution, sparse remains faster at <=1%. This conservative threshold guarantees predictable behavior while targeting the most common case where sparse representations excel. This PR also introduces a `LiveDocs` interface with a new method `LeafReader.getLiveDocsWithDeletedIterator()` that enables efficient O(deletedDocs) iteration via `deletedDocsIterator()`. Consumers can check the iterator's `cost()` method to determine whether iterating deleted docs would be beneficial for their use case. Rather than replacing the existing `getLiveDocs()` API (which would require extensive changes across the codebase), this approach lets consumers opt-in to the optimization when they need it. Use cases like `PointTreeBulkCollector` (histogram correction, #15226) can now efficiently iterate only deleted documents to adjust their counts, while existing code continues to work unchanged. ## Benchmark Results **10M document segment:** ### Random pattern (typical real-world scenario) | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs Iter | Speedup | |--------------|--------------|---------------|-----------|-------------------|---------| | 0.1% | 1,250,040 bytes | 163,776 bytes | 7.6x | 2.75 ms -> 0.09 ms | 31.7x faster | | 1% | 1,250,040 bytes | 804,008 bytes | 1.6x | 3.25 ms -> 0.88 ms | 3.7x faster | | 5% | 1,250,040 bytes | 1,318,440 bytes | 0.9x* | 5.57 ms -> 4.56 ms | 1.2x faster | ### Clustered pattern (best case for sparse) | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs Iter | Speedup | |--------------|--------------|---------------|-----------|-------------------|---------| | 0.1% | 1,250,040 bytes | 30,760 bytes | 40.6x | 2.75 ms -> 0.08 ms | 36.6x faster | | 1% | 1,250,040 bytes | 42,376 bytes | 29.5x | 2.95 ms -> 0.75 ms | 3.9x faster | | 5% | 1,250,040 bytes | 93,856 bytes | 13.3x | 2.98 ms -> 3.75 ms | 0.8x slower* | ### Uniform pattern (worst case for sparse) | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs Iter | Speedup | |--------------|--------------|---------------|-----------|-------------------|---------| | 0.1% | 1,250,040 bytes | 185,624 bytes | 6.7x | 2.8 ms -> 0.07 ms | 40.1x faster | | 1% | 1,250,040 bytes | 1,318,368 bytes | 0.9x* | 2.75 ms -> 0.84 ms | 3.3x faster | | 5% | 1,250,040 bytes | 1,318,552 bytes | 0.9x* | 2.9 ms -> 3.85 ms | 0.8x slower* | **Why the conservative 1% threshold?** These benchmarks show significant pattern-dependent behavior: - **CLUSTERED** (best case): Sparse wins on memory even at 5% (13x reduction) but loses on iteration speed - **UNIFORM** (worst case): Sparse already uses more memory at 1% and loses iteration speed at 5% - **RANDOM** (typical): Middle ground between clustered and uniform By choosing 1%, sparse is used only when it generally delivers clear memory wins across most deletion patterns and never performs catastrophically worse. At <= 1%, sparse provides up to 1.6-40x memory reduction and 3-40x iteration speedup across typical or best-case deletion distributions. ### Pathological case: maximally scattered deletions This PR also tested a worst-case scenario with deletions maximally scattered across the bitset (1.5625% deletion rate): | Segment Size | Dense Memory | Sparse Memory | Overhead | Deleted Docs Iter | Speedup | |-------------|--------------|---------------|----------|-------------------|---------| | 100K | 12,544 bytes | 13,376 bytes | +6.6% | 0.03 ms -> 0.01 ms | 4.3x faster | | 500K | 62,544 bytes | 66,032 bytes | +5.6% | 0.13 ms -> 0.03 ms | 4.3x faster | | 1M | 125,040 bytes | 131,944 bytes | +5.5% | 0.27 ms -> 0.06 ms | 4.3x faster | | 5M | 625,040 bytes | 659,416 bytes | +5.5% | 1.34 ms -> 0.32 ms | 4.1x faster | | 10M | 1,250,040 bytes | 1,318,552 bytes | +5.5% | 2.70 ms -> 0.65 ms | 4.2x faster | | 50M | 6,250,040 bytes | 6,591,904 bytes | +5.5% | 13.62 ms -> 3.26 ms | 4.2x faster | | 100M | 12,500,040 bytes | 13,183,712 bytes | +5.5% | 27.47 ms -> 6.52 ms | 4.2x faster | Even in this unfavorable scenario, where sparse uses 5-6% more memory, the iteration speedup remains stable at ~4x across all segment sizes. This shows that the overhead remains bounded and predictable, making it an acceptable trade-off for iteration-heavy workloads. ## Backward Compatibility This change introduces no breaking changes: - Disk format remains unchanged (`.liv` files are fully compatible) - Existing indexes require no reindexing - Code using the `Bits` API continues to function as before **Note on disk format:** This PR keeps the existing Lucene90 `.liv` format (dense bitset on disk) to minimize changes and maintain compatibility. When reading, `Lucene90LiveDocsFormat` converts the on-disk dense representation to the in-memory sparse representation for segments with <=1% deletions. This conversion has negligible overhead since it only happens when segments are loaded, not during queries. A follow-up PR could introduce a new codec version that writes sparse deletions natively to disk, eliminating the conversion step entirely and reducing disk space for .liv files. --- Fixes #13084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
