jainankitk commented on issue #13084:
URL: https://github.com/apache/lucene/issues/13084#issuecomment-3500159963

   Thanks @salvatorecampagna for elaborating the approach. The execution plans 
looks pretty reasonable to me. Couple of questions to understand this better:
   
   > Note that runtime-only requires an O(maxDoc) scan during segment open to 
convert from the dense on-disk format to sparse in-memory representation for 
sparse cases.
   
   I am assuming that the O(maxDoc) cost is only incurred when we actually 
build the sparse in-memory representation. Also does this involve reading 
additional data from disk during segment open. If yes, it should be bound by 
the size of live docs, right?
   
   > This would validate (and potentially adjust) the 20% starting point.
   
   Since we are looking at `HistogramCollector` as one of the primary use case, 
20% is slightly high imo. For example, if we have 1m documents in a segment, 
and 200k deleted documents, it might be just better to take the non-efficient 
path, instead of collecting using `PointTreeTraversal` first, and then 
retrospective correction by iterating over 200k documents and accessing their 
doc values. That being said, we can come up with better threshold based on the 
benchmark results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to