>> the answer has never been satisfactory
Is this the original question?
http://www.nabble.com/Which-field-matched---tf4141549.html#a11780757
What actually formed the basis of a document match is hidden in a tree of
heterogeneous Query objects and to be efficient their match output is
limited to document ids and scores- not some detailed analysis of
which sections of a document matched. It is therefore hard/impossible
to have any highlighting solution which provides answers for all query
types and the existing Highlighter relies on a rough heuristic where
QueryTermExtractor output is used to find the list of query terms used and
field TokenStreams are analyzed for content.
You mention Highlighter performance would be bad for wildcard queries. Have you
tried it? If it does turn out to be bad (many wildcard variants produced) might
I suggest the following:
1) Dissect the unrewritten Query and find all WildcardQuery objects
2) Create a custom analyzer that re-implements the wildcard logic and produces
a highlighter-friendly token stream i,e
Given a query of
Fred W*
and data of
Fred West was arrested
the analyzer would produce:
Fred [W*|West] was arrested
..where the tokens "W*" and "West" appear at the same position
3) Add a special wildcard term (W*) to the list of Query terms given to the
Highlighter. This would then match with the W* injected into the content in
step 2)
This would avoid the overhead of picking through all the wildcard variants
produced by the wildcardQuery but at the cost of extra coding on your part and
the runtime cost of re-executing wildcard logic on all terms in the selected
documents' TokenStreams. The difference in runtime cost may prove minimal.
Cheers
Mark
___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]