salvatorecampagna commented on issue #13084:
URL: https://github.com/apache/lucene/issues/13084#issuecomment-3505091191

   Yes, I agree that the memory overhead ratio captures both deletion rate 
**AND** clustering at the block level, making it an excellent proxy to 
understand when sparse is beneficial.
   
   However, in practice, `Lucene90LiveDocsFormat` only has access to `maxDoc` 
and `delCount` (via `SegmentCommitInfo`) when deciding which implementation to 
use. We don't know the deletion distribution pattern until after loading. Using 
memory overhead as a runtime criterion would require allocating the sparse 
structure first, measuring its footprint, then potentially discarding it. 
Instead, I'm using benchmarks to find the **deletion rate threshold** 
(`delCount/maxDoc`) where memory overhead becomes unacceptable, which can 
bechecked up-front before allocation.
   
   **So I'm using deletion rate as a proxy for memory overhead**, with the 
benchmarks validating that this correlation holds reliably across different 
workloads.
   
   I'm validating this empirically with different deletion patterns (RANDOM, 
CLUSTERED, UNIFORM) across various deletion rates and segment sizes. The ROI 
analysis should reveal a clear decision boundary, identifying where sparse 
provides significant speedup with acceptable memory cost versus where the 
overhead outweighs the benefit.
   
   The benchmark results should confirm whether a simple deletion rate 
threshold is sufficient, or whether the relationship between deletion rate and 
memory overhead varies enough across different patterns to require a more 
sophisticated approach (though I hope the simple threshold works!).
   
   I'll also test pathological worst-case scenarios to ensure the threshold 
remains robust under adversarial conditions.
   
   Benchmarks are running BTW... it's quite a few of them and will need a few 
hours :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to