Hello,

The project I'm currently working on requires the reporting of exact hit 
positions from some pretty hairy queries, not all of which are covered by the 
existing highlighter modules.  I'm working round this by translating everything 
into SpanQueries, and using the getSpans() method to locate hits (I've extended 
the Spans interface to make term offsets available - see 
https://issues.apache.org/jira/browse/LUCENE-3826).  This works for our 
use-case, but isn't terribly efficient, and obviously isn't applicable to 
non-Span queries.

I've seen a bit of chatter on the list about using term offsets to provide 
accurate highlighting in Lucene.  I'm going to have a couple of weeks free in 
April, and I thought I might have a go at implementing this.  Mainly I'm 
wondering if there's already been thoughts about how to do it.  My current 
thoughts are to somehow extend the Weight and Scorer interface to make term 
offsets available; to get highlights for a given set of documents, you'd 
essentially run the query again, with a filter on just the documents you want 
highlighted, and have a custom collector that gets the term offsets in place of 
the scores.

All pointers gratefully received!

Thanks,

Alan Woodward

Reply via email to