I do use the NullFragmenter now. I have no interest in the fragments at
the moment, just in showing hits on the source document. It would be
great if I could just show the real hits though. The span approach seems
to work fine for me. I have even tested the highlighting using my
sentence and paragraph proximity search queries from my query parser.
These use a modified NotSpan (I call it WithinSpan) within an unbound
NearSpan. I did a few queries that combine that structure with wildcard
and boolean queries...everything appeared to work grand -- I got all the
correct highlights. I just have to combine the highlights (spans) and
refine my code (and that color comment Otis made is something I am
interested in well -- it would be great to have the words found in a
single spanquery be the same color, or a similar shade).
- Mark
markharw00d wrote:
>>For what it's worth Mark (Miller), there *is* a need for "just
highlight the query terms without trying to get excerpts" functionality
>>- something a la Google cache (different colours...mmm, nice).
FWIW, the existing highlighter doesn't *have* to fragment - just pass
a NullFragmenter to the highlighter.
Ideally we'd have one implementation that tackles phrase support and
preserves (optional) support for selecting fragments. I can see that
to achieve this the existing highlighter design would need to change.
Currently the highlighter identifies fragments first (typically using
an implementation which arbitrarily chops text after 'n' words) and
then selects which of these fragments have the highest density of
high-scoring query terms. This logic would need to change to :
1) Use QuerySpansExtractor to identify all the *spans* in the document
2) Use a sliding window to select fragments, taking care to select
fragments that wholly contain spans, rather than selecting only part
of a span.
3) Mark up the hits.
Clearly, for people uninterested in selecting fragments, step 2 can be
skipped.
Cheers
Mark
___________________________________________________________ All new
Yahoo! Mail "The new Interface is stunning in its simplicity and ease
of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]