>>For what it's worth Mark (Miller), there *is* a need for "just
highlight the query terms without trying to get excerpts" functionality
>>- something a la Google cache (different colours...mmm, nice).
FWIW, the existing highlighter doesn't *have* to fragment - just pass a
NullFragmenter to the highlighter.
Ideally we'd have one implementation that tackles phrase support and
preserves (optional) support for selecting fragments. I can see that to
achieve this the existing highlighter design would need to change.
Currently the highlighter identifies fragments first (typically using an
implementation which arbitrarily chops text after 'n' words) and then
selects which of these fragments have the highest density of
high-scoring query terms. This logic would need to change to :
1) Use QuerySpansExtractor to identify all the *spans* in the document
2) Use a sliding window to select fragments, taking care to select
fragments that wholly contain spans, rather than selecting only part of
a span.
3) Mark up the hits.
Clearly, for people uninterested in selecting fragments, step 2 can be
skipped.
Cheers
Mark
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]