>>Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? I

I haven't conducted a survey but it's the typical web search engine scenario - select only a small subset of the matching document content for display in SERPS. I would expect that to be a pretty commonplace requirement for which we should retain a solution.

Maybe a new highlighter with no attempt at summarising could more easily address phrase support for small pieces of content. It will always be hard to faithfully represent all possible query match logic - especially if there are NOTs, ANDs and ORs mixed in with all the term proximity logic e.g. NotNear. Some compromise is required. I did suggest that spans maybe a better basis for highlighting than terms and pointed at some existing code to get you along this path - see here http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

There are also a couple of other Highlighter packages contributed recently which I listed in my previous mail but I simply haven't had the time to look at in detail so they may be useful. Anyone had any experience of those?

>> every new highlight has to be compared against every previous highlight for overlap Yes, Analyzers that produce overlapping tokens are an added complication when implementing highlighting logic. I think we have a reasonable Junit test containing several of the more exotic analyzer scenarios which you could/should use for testing any other highlighter implementation.

Cheers,
Mark



Mark Miller wrote:
Isn't it semi trivial if you are not interested in the fragments (I swear it seems that most people are not)? Isn't it you that suggested turning the query into a SpanQuery, extracting the spans and then doing the highlighting after a rewrite? This seems somewhat trivial so what am I missing? I have started a simple implementation of this, but stopped short of combining the highlight spans (seems like a nasty n^2 problem that I don't know a good algorithm around - every new highlight has to be compared against every previous highlight for overlap : I am sure your the man to ask about this). I plan on getting back into this soon. Not trivial? Or do you just mean with the fragments...you seem to be deeply interested in fragments but a lot of people seem to just want to highlight the source text.

Any words of wisdom would be sorely appreciated.

- Mark

markharw00d wrote:
This is a deficiency in the highlighter functionality that has been discussed several times before. The summary is - not a trivial fix.

See here for background:

http://marc2.theaimsgroup.com/?l=lucene-user&m=114631181214303&w=1

http://www.gossamer-threads.com/lists/engine?do=post_view_printable;post=42014;list=lucene


Cheers,
Mark

Anne Conger wrote:
Hi,

I'm wondering what the best way is to do highlighting of multiword phrases. For example, if a search is for "president kennedy", how can I make sure
that "president" is only highlighted if it is next to "kennedy" and
"president" in "president clinton" is not.
I haven't figured out where in the process the phrases are being split into
separate words.
Would restructuring the query that is passed to the scorer help with this?
It's currently a set of boolean queries with each phrase as a separate
query.  Or should the exact phrases be set up as WeightedTerms?

Thanks!

Anne


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






        
        
                
___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to