amoll75 commented on issue #16263: URL: https://github.com/apache/lucene/issues/16263#issuecomment-4718862448
I understand the concern about API complexity. What I had in mind was not necessarily a mechanism specific to passage length, but a more generic way to apply document or embedding-level factors to vector similarity scores. For example, the scaling information could simply be stored in one or more numeric (e.g. double) fields and applied multiplicatively to the vector score during retrieval. This would enable a wide range of use cases, including passage-length normalization, recency boosting, authority scores, popularity signals, quality metrics, or other application-specific ranking factors. Alternatively, a more flexible approach could be based on function queries, allowing users to define how vector similarity should be combined with document attributes. I fully appreciate that either approach would require significant implementation effort and would introduce additional API surface area. However, I believe the resulting functionality would be broadly useful, as it would allow vector retrieval to be combined with structured ranking signals without requiring large candidate expansions and external reranking pipelines. In my view, the increase in functionality could justify the additional complexity, especially for production search systems where vector similarity is only one component of the final ranking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
