salvatorecampagna opened a new pull request, #15413:
URL: https://github.com/apache/lucene/pull/15413

   ## Scope
   
   For segments with sparse deletions (<=1%), this change tracks only the 
deleted document IDs instead of maintaining a full bitset of all documents. 
This simple change reduces LiveDocs memory usage by up to 8x and speeds up 
deleted-document iteration by 3-4x in typical append-heavy workloads.
   
   ## The Problem
   
   Lucene currently allocates `maxDoc/8` bytes for LiveDocs, independent of the 
number of deletions. For example, a 10M-document segment always allocates ~1.2 
MB even if only 100K documents (1%) are deleted, wasting memory on mostly live 
documents.
   
   This change stores only the deleted document IDs, reducing memory by up to 
8x at a 1% deletion rate. The savings scale linearly: for example, a 
100M-document segment with 1% deletions drops from ~12 MB to ~800 KB (random 
pattern).
   
   ## How It Works
   
   The implementation uses an adaptive approach:
   
   - **SparseLiveDocs** (<=1% deletions): Stores only deleted doc IDs using 
`SparseFixedBitSet`
   - **DenseLiveDocs** (>1% deletions): Uses traditional `FixedBitSet` for live 
docs
   
   This PR introduces two complementary implementations: `SparseLiveDocs` for 
low deletion rates and `DenseLiveDocs` for high deletion rates, each optimized 
for their respective cases. These implementations add efficient iteration 
methods via the `LiveDocs` interface: `deletedDocsIterator()` for 
O(deletedDocs) iteration and `liveDocsIterator()` for O(liveDocs) iteration.
   
   The `LiveDocs` interface approach allows the test framework to wrap these 
implementations with `AssertingLiveDocs`, which validates correctness during 
testing by delegating to the underlying `LiveDocs` methods while adding 
assertions. This preserved compatibility with existing test infrastructure 
without requiring changes to test code.
   
   The codec automatically selects the right format when reading `.liv` files 
from disk. Benchmarks show the crossover point, where sparse and dense 
performance equalize, occurs around 5-10% depending on deletion pattern. By 
choosing 1%, sparse provides clear wins in both memory and iteration speed 
across all patterns. Even in the worst-case (uniform) distribution, sparse 
remains faster at <=1%. This conservative threshold guarantees predictable 
behavior while targeting the most common case where sparse representations 
excel.
   
   This PR also introduces a `LiveDocs` interface with a new method 
`LeafReader.getLiveDocsWithDeletedIterator()` that enables efficient 
O(deletedDocs) iteration via `deletedDocsIterator()`. Consumers can check the 
iterator's `cost()` method to determine whether iterating deleted docs would be 
beneficial for their use case. Rather than replacing the existing 
`getLiveDocs()` API (which would require extensive changes across the 
codebase), this approach lets consumers opt-in to the optimization when they 
need it. Use cases like `PointTreeBulkCollector` (histogram correction, #15226) 
can now efficiently iterate only deleted documents to adjust their counts, 
while existing code continues to work unchanged.
   
   ## Benchmark Results
   
   **10M document segment:**
   
   ### Random pattern (typical real-world scenario)
   
   | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs 
Iter | Speedup |
   
|--------------|--------------|---------------|-----------|-------------------|---------|
   | 0.1%         | 1,250,040 bytes | 163,776 bytes | 7.6x      | 2.75 ms -> 
0.09 ms | 31.7x faster |
   | 1%           | 1,250,040 bytes | 804,008 bytes | 1.6x      | 3.25 ms -> 
0.88 ms | 3.7x faster |
   | 5%           | 1,250,040 bytes | 1,318,440 bytes | 0.9x*     | 5.57 ms -> 
4.56 ms | 1.2x faster |
   
   ### Clustered pattern (best case for sparse)
   
   | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs 
Iter | Speedup |
   
|--------------|--------------|---------------|-----------|-------------------|---------|
   | 0.1%         | 1,250,040 bytes | 30,760 bytes | 40.6x     | 2.75 ms -> 
0.08 ms | 36.6x faster |
   | 1%           | 1,250,040 bytes | 42,376 bytes | 29.5x     | 2.95 ms -> 
0.75 ms | 3.9x faster |
   | 5%           | 1,250,040 bytes | 93,856 bytes | 13.3x     | 2.98 ms -> 
3.75 ms | 0.8x slower* |
   
   ### Uniform pattern (worst case for sparse)
   
   | Deletion Rate | Dense Memory | Sparse Memory | Reduction | Deleted Docs 
Iter | Speedup |
   
|--------------|--------------|---------------|-----------|-------------------|---------|
   | 0.1%         | 1,250,040 bytes | 185,624 bytes | 6.7x      | 2.8 ms -> 
0.07 ms | 40.1x faster |
   | 1%           | 1,250,040 bytes | 1,318,368 bytes | 0.9x*     | 2.75 ms -> 
0.84 ms | 3.3x faster |
   | 5%           | 1,250,040 bytes | 1,318,552 bytes | 0.9x*     | 2.9 ms -> 
3.85 ms | 0.8x slower* |
   
   **Why the conservative 1% threshold?** These benchmarks show significant 
pattern-dependent behavior:
   - **CLUSTERED** (best case): Sparse wins on memory even at 5% (13x 
reduction) but loses on iteration speed
   - **UNIFORM** (worst case): Sparse already uses more memory at 1% and loses 
iteration speed at 5%
   - **RANDOM** (typical): Middle ground between clustered and uniform
   
   By choosing 1%, sparse is used only when it generally delivers clear memory 
wins across most deletion patterns and never performs catastrophically worse. 
At <= 1%, sparse provides up to 1.6-40x memory reduction and 3-40x iteration 
speedup across typical or best-case deletion distributions.
   
   ### Pathological case: maximally scattered deletions
   
   This PR also tested a worst-case scenario with deletions maximally scattered 
across the bitset (1.5625% deletion rate):
   
   | Segment Size | Dense Memory | Sparse Memory | Overhead | Deleted Docs Iter 
| Speedup |
   
|-------------|--------------|---------------|----------|-------------------|---------|
   | 100K        | 12,544 bytes | 13,376 bytes | +6.6%    | 0.03 ms -> 0.01 ms 
| 4.3x faster |
   | 500K        | 62,544 bytes | 66,032 bytes | +5.6%    | 0.13 ms -> 0.03 ms 
| 4.3x faster |
   | 1M          | 125,040 bytes | 131,944 bytes | +5.5%    | 0.27 ms -> 0.06 
ms | 4.3x faster |
   | 5M          | 625,040 bytes | 659,416 bytes | +5.5%    | 1.34 ms -> 0.32 
ms | 4.1x faster |
   | 10M         | 1,250,040 bytes | 1,318,552 bytes | +5.5%    | 2.70 ms -> 
0.65 ms | 4.2x faster |
   | 50M         | 6,250,040 bytes | 6,591,904 bytes | +5.5%    | 13.62 ms -> 
3.26 ms | 4.2x faster |
   | 100M        | 12,500,040 bytes | 13,183,712 bytes | +5.5%    | 27.47 ms -> 
6.52 ms | 4.2x faster |
   
   Even in this unfavorable scenario, where sparse uses 5-6% more memory, the 
iteration speedup remains stable at ~4x across all segment sizes. This shows 
that the overhead remains bounded and predictable, making it an acceptable 
trade-off for iteration-heavy workloads.
   
   ## Backward Compatibility
   
   This change introduces no breaking changes:
   - Disk format remains unchanged (`.liv` files are fully compatible)
   - Existing indexes require no reindexing
   - Code using the `Bits` API continues to function as before
   
   **Note on disk format:** This PR keeps the existing Lucene90 `.liv` format 
(dense bitset on disk) to minimize changes and maintain compatibility. When 
reading, `Lucene90LiveDocsFormat` converts the on-disk dense representation to 
the in-memory sparse representation for segments with <=1% deletions. This 
conversion has negligible overhead since it only happens when segments are 
loaded, not during queries. A follow-up PR could introduce a new codec version 
that writes sparse deletions natively to disk, eliminating the conversion step 
entirely and reducing disk space for .liv files.
   
   ---
   
   Fixes #13084
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to