>>Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERPS. I would expect that to be a pretty commonplace
requirement for which we should retain a solution.
Maybe a new highlighter with no attempt at summarising could more easily
address phrase support for small pieces of content. It will always be
hard to faithfully represent all possible query match logic -
especially if there are NOTs, ANDs and ORs mixed in with all the term
proximity logic e.g. NotNear. Some compromise is required. I did suggest
that spans maybe a better basis for highlighting than terms and pointed
at some existing code to get you along this path - see here
http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
There are also a couple of other Highlighter packages contributed
recently which I listed in my previous mail but I simply haven't had the
time to look at in detail so they may be useful. Anyone had any
experience of those?
>> every new highlight has to be compared against every previous
highlight for overlap
Yes, Analyzers that produce overlapping tokens are an added complication
when implementing highlighting logic. I think we have a reasonable Junit
test containing several of the more exotic analyzer scenarios which you
could/should use for testing any other highlighter implementation.
Cheers,
Mark
Mark Miller wrote:
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? Isn't it you that suggested
turning the query into a SpanQuery, extracting the spans and then
doing the highlighting after a rewrite? This seems somewhat trivial so
what am I missing? I have started a simple implementation of this, but
stopped short of combining the highlight spans (seems like a nasty n^2
problem that I don't know a good algorithm around - every new
highlight has to be compared against every previous highlight for
overlap : I am sure your the man to ask about this). I plan on getting
back into this soon. Not trivial? Or do you just mean with the
fragments...you seem to be deeply interested in fragments but a lot of
people seem to just want to highlight the source text.
Any words of wisdom would be sorely appreciated.
- Mark
markharw00d wrote:
This is a deficiency in the highlighter functionality that has been
discussed several times before. The summary is - not a trivial fix.
See here for background:
http://marc2.theaimsgroup.com/?l=lucene-user&m=114631181214303&w=1
http://www.gossamer-threads.com/lists/engine?do=post_view_printable;post=42014;list=lucene
Cheers,
Mark
Anne Conger wrote:
Hi,
I'm wondering what the best way is to do highlighting of multiword
phrases.
For example, if a search is for "president kennedy", how can I make
sure
that "president" is only highlighted if it is next to "kennedy" and
"president" in "president clinton" is not.
I haven't figured out where in the process the phrases are being
split into
separate words.
Would restructuring the query that is passed to the scorer help with
this?
It's currently a set of boolean queries with each phrase as a separate
query. Or should the exact phrases be set up as WeightedTerms?
Thanks!
Anne
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________ All new
Yahoo! Mail "The new Interface is stunning in its simplicity and ease
of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]