I do use the NullFragmenter now. I have no interest in the fragments at the moment, just in showing hits on the source document. It would be great if I could just show the real hits though. The span approach seems to work fine for me. I have even tested the highlighting using my sentence and paragraph proximity search queries from my query parser. These use a modified NotSpan (I call it WithinSpan) within an unbound NearSpan. I did a few queries that combine that structure with wildcard and boolean queries...everything appeared to work grand -- I got all the correct highlights. I just have to combine the highlights (spans) and refine my code (and that color comment Otis made is something I am interested in well -- it would be great to have the words found in a single spanquery be the same color, or a similar shade).

- Mark

markharw00d wrote:
>>For what it's worth Mark (Miller), there *is* a need for "just highlight the query terms without trying to get excerpts" functionality
>>- something a la Google cache (different colours...mmm, nice).

FWIW, the existing highlighter doesn't *have* to fragment - just pass a NullFragmenter to the highlighter. Ideally we'd have one implementation that tackles phrase support and preserves (optional) support for selecting fragments. I can see that to achieve this the existing highlighter design would need to change. Currently the highlighter identifies fragments first (typically using an implementation which arbitrarily chops text after 'n' words) and then selects which of these fragments have the highest density of high-scoring query terms. This logic would need to change to :
1) Use QuerySpansExtractor to identify all the *spans* in the document
2) Use a sliding window to select fragments, taking care to select fragments that wholly contain spans, rather than selecting only part of a span.
3) Mark up the hits.
Clearly, for people uninterested in selecting fragments, step 2 can be skipped.

Cheers
Mark


___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to