On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward <alan.woodw...@romseysoftware.co.uk> wrote: > Hello, > > The project I'm currently working on requires the reporting of exact hit > positions from some pretty hairy queries, not all of which are covered by > the existing highlighter modules. I'm working round this by translating > everything into SpanQueries, and using the getSpans() method to locate hits > (I've extended the Spans interface to make term offsets available - > see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our > use-case, but isn't terribly efficient, and obviously isn't applicable to > non-Span queries. > > I've seen a bit of chatter on the list about using term offsets to provide > accurate highlighting in Lucene. I'm going to have a couple of weeks free > in April, and I thought I might have a go at implementing this. Mainly I'm > wondering if there's already been thoughts about how to do it. My current > thoughts are to somehow extend the Weight and Scorer interface to make term > offsets available; to get highlights for a given set of documents, you'd > essentially run the query again, with a filter on just the documents you > want highlighted, and have a custom collector that gets the term offsets in > place of the scores. >
Hi Alan, Simon started some initial work on https://issues.apache.org/jira/browse/LUCENE-2878 Some work and prototypes were done in a branch, but it might be lagging behind trunk a bit. Additionally at the time it was first done, I think we didn't yet support offsets in the postings lists. We've since added this and several codecs support it. -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org