Re: [I] Bias Towards Short Text Segments in Vector Search Results [lucene]

via GitHub Tue, 16 Jun 2026 05:44:34 -0700


amoll75 commented on issue #16263:
URL: https://github.com/apache/lucene/issues/16263#issuecomment-4718862448


   I understand the concern about API complexity.
   
   What I had in mind was not necessarily a mechanism specific to passage 
length, but a more generic way to apply document or embedding-level factors to 
vector similarity scores.
   
   For example, the scaling information could simply be stored in one or more 
numeric (e.g. double) fields and applied multiplicatively to the vector score 
during retrieval. This would enable a wide range of use cases, including 
passage-length normalization, recency boosting, authority scores, popularity 
signals, quality metrics, or other application-specific ranking factors.
   
   Alternatively, a more flexible approach could be based on function queries, 
allowing users to define how vector similarity should be combined with document 
attributes.
   
   I fully appreciate that either approach would require significant 
implementation effort and would introduce additional API surface area. However, 
I believe the resulting functionality would be broadly useful, as it would 
allow vector retrieval to be combined with structured ranking signals without 
requiring large candidate expansions and external reranking pipelines.
   
   In my view, the increase in functionality could justify the additional 
complexity, especially for production search systems where vector similarity is 
only one component of the final ranking.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Bias Towards Short Text Segments in Vector Search Results [lucene]

Reply via email to