Hi Chris, While this is theoretically possible, this would require rewriting all queries that you might want to run, so this would be a huge investment.
In general doing something like that is a bad idea since it requires computing highlights for many documents that may not make it to the top-k hits. On Thu, Nov 4, 2021 at 5:44 PM Hahn, Christopher (TR Technology) < [email protected]> wrote: > Hello Lucene Developers, > > We’re working on a search service which uses lucene indexes. One of the > things I’m hoping to find is different places where we can plug in our > custom classes during the search process. > > This first use case is for highlighting. The legacy search engine we use > collects all term positions for highlighting during the search process. So > everything happens all at once instead of the > search-first-then-highlight-model. For how we use highlighting, this is > more efficient for us, instead of reprocessing the query. > > One thought I had was creating a custom scorer that would be called during > search, and it would gather highlights in addition to scoring. I think this > would be especially useful for proximity queries, or any other scoring > based on positions of words in the document. Instead of advancing the term > vectors and finding phrases in a document at search time, and then doing it > AGAIN at highlight time, if there was a way to access the data used by the > search process. > > > > Any suggestions, comments, or references that would enlighten me would be > appreciated. I’ve had great difficulty finding helpful documents as I get > to know Lucene. > > > > Thanks, > > Chris Hahn > This e-mail is for the sole use of the intended recipient and contains > information that may be privileged and/or confidential. If you are not an > intended recipient, please notify the sender by return e-mail and delete > this e-mail and any attachments. Certain required legal entity disclosures > can be accessed on our website: > https://www.thomsonreuters.com/en/resources/disclosures.html > -- Adrien
