Using the original org.apache.lucene.search.highlight.Highlighter should I be 
able to give it a query like [ My AND Words AND "My Words"^100 ] (the actually 
phrase in this query is converted to a span query with a slop 1),
and expect it find the fragment many pages into the file that has span "My 
Words" and rank it better than fragments earlier in the document with "My" and 
"Word" (or lots of "My" and "Words")?

I  ask because currently, I'm not getting the fragment with the phrase as the 
best fragment, and I go through some hacky post processing to look down the 
list for a "better" match, but I'm wondering if we have the HitHighlighter 
wired up wrong.

At this time, my index does not have offsets and positions vectors for all 
tokenized fields and the body "text" field just how positions.

I understand that FastVectorHighlighter is fast, but would it do a better job 
of finding the phrase or span in the text if I added positions and offsets to 
text?

When highlighting the small fields like title, path etc.  should I add term 
vector with positions and offset and use FastVectorHighlighter or is it just 
not worth storing that extra information just for highlighting?

-Paul


Reply via email to