Have you marked that for GSOC? Would be a good idea! ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
> -----Original Message----- > From: Simon Willnauer [mailto:simon.willna...@googlemail.com] > Sent: Monday, March 19, 2012 4:43 PM > To: dev@lucene.apache.org > Subject: Re: Using term offsets for hit highlighting > > Alan, you made my day! > > The branch is kind of outdated but I looked at it lately and I can certainly > help > to get it up to speed. The feature in that branch is quite a big one and its > in a > very early stage. Still I want to encourage you to take a look and work on > it. I > promise all my help with the issues! > > let me know if you have questions! > > simon > > On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward > <alan.woodw...@romseysoftware.co.uk> wrote: > > Cool, thanks Robert. I'll take a look at the JIRA ticket. > > > > On 19 Mar 2012, at 14:44, Robert Muir wrote: > > > >> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward > >> <alan.woodw...@romseysoftware.co.uk> wrote: > >>> Hello, > >>> > >>> The project I'm currently working on requires the reporting of exact > >>> hit positions from some pretty hairy queries, not all of which are > >>> covered by the existing highlighter modules. I'm working round this > >>> by translating everything into SpanQueries, and using the getSpans() > >>> method to locate hits (I've extended the Spans interface to make > >>> term offsets available - see > >>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for > >>> our use-case, but isn't terribly efficient, and obviously isn't > >>> applicable to > non-Span queries. > >>> > >>> I've seen a bit of chatter on the list about using term offsets to > >>> provide accurate highlighting in Lucene. I'm going to have a couple > >>> of weeks free in April, and I thought I might have a go at > >>> implementing this. Mainly I'm wondering if there's already been > >>> thoughts about how to do it. My current thoughts are to somehow > >>> extend the Weight and Scorer interface to make term offsets > >>> available; to get highlights for a given set of documents, you'd > >>> essentially run the query again, with a filter on just the documents > >>> you want highlighted, and have a custom collector that gets the term > offsets in place of the scores. > >>> > >> > >> Hi Alan, Simon started some initial work on > >> https://issues.apache.org/jira/browse/LUCENE-2878 > >> > >> Some work and prototypes were done in a branch, but it might be > >> lagging behind trunk a bit. > >> > >> Additionally at the time it was first done, I think we didn't yet > >> support offsets in the postings lists. > >> We've since added this and several codecs support it. > >> > >> -- > >> lucidimagination.com > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > >> additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org