Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm working round this by translating everything into SpanQueries, and using the getSpans() method to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our use-case, but isn't terribly efficient, and obviously isn't applicable to non-Span queries.
I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting in Lucene. I'm going to have a couple of weeks free in April, and I thought I might have a go at implementing this. Mainly I'm wondering if there's already been thoughts about how to do it. My current thoughts are to somehow extend the Weight and Scorer interface to make term offsets available; to get highlights for a given set of documents, you'd essentially run the query again, with a filter on just the documents you want highlighted, and have a custom collector that gets the term offsets in place of the scores. All pointers gratefully received! Thanks, Alan Woodward