Re: highlighting phrases

markharw00d Wed, 01 Sep 2004 00:48:23 -0700

Adding support for phrases could be tricky.
So far I have deliberately avoided reimplementing specialized highlighting logic for 
each of the different types of
queries eg understanding the nuances of "slop factor" in Phrase queries. I may be 
wrong but adding specialized 
support for different query types just feels like the start of a slippery slope.


If people are keen to add such support though, here are some pointers to bear in 
mind...

Remember that the highlighter is also designed to summarize docs by selecting best 
fragments.
One decision to be made up front is to consider if a special "Fragmenter" 
implementation is required that uses the
query to influence the way it breaks the doc into fragments ie. it ensures that 
matching words in phrase queries 
or span queries remain in the same fragment.  

If phrases matches are allowed to span fragments thought needs to be given as to how 
the fragments are scored.

Do phrases/spans get marked up with one tag eg <B>My Phrase</B> or many eg <B>My</B> 
<B>Phrase</B> ?
I expect "many" is the answer given the possibility of other query terms appearing 
intermingled in a  phrase with a 
high slop factor or a span.

The position of terms in the phrases will need to be known by the Formatter 
implementation before attempting 
to mark up the text. This could/should be done using position info in the Lucene index 
rather than requiring a separate
analyzer pass over the original text.

Most of this should be acheivable using specialized implementations of Formatter, 
Fragmenter and Scorer so the main
Highlighter code should be untouched.

These are just some of the "gotchas" off the top of my head. I'm sure there will be 
several more issues waiting to be revealed...
Hope this helps anyway.
Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: highlighting phrases

Reply via email to