Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-)
Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: > I posted a patch with a Collector somewhat similar to what you described, > Alan - it's attached to one of the sub-issues > https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly > complete "alpha" state, but has seen no production use of course, since it > relies on the remainder of the unfinished work in that branch. It works by > creating a TokenStream based on match positions returned from the query and > passing that to the existing Highlighter. Please feel free to get in touch > if you decide to look into that and have questions. > > > -Mike > > On 03/19/2012 11:51 AM, Simon Willnauer wrote: >> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote: >> >>> Have you marked that for GSOC? Would be a good idea! >>> >> yes I did >> >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>> >>> >>>> -----Original Message----- >>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>> Sent: Monday, March 19, 2012 4:43 PM >>>> To: dev@lucene.apache.org >>>> Subject: Re: Using term offsets for hit highlighting >>>> >>>> Alan, you made my day! >>>> >>>> The branch is kind of outdated but I looked at it lately and I can >>>> certainly help >>>> to get it up to speed. The feature in that branch is quite a big one and >>>> its in a >>>> very early stage. Still I want to encourage you to take a look and work on >>>> it. I >>>> promise all my help with the issues! >>>> >>>> let me know if you have questions! >>>> >>>> simon >>>> >>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>> >>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>> >>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>> >>>>> >>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> The project I'm currently working on requires the reporting of exact >>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>> covered by the existing highlighter modules. I'm working round this >>>>>>> by translating everything into SpanQueries, and using the getSpans() >>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>> term offsets available - see >>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for >>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>> applicable to >>>>>>> >>>> non-Span queries. >>>> >>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>> provide accurate highlighting in Lucene. I'm going to have a couple >>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>> essentially run the query again, with a filter on just the documents >>>>>>> you want highlighted, and have a custom collector that gets the term >>>>>>> >>>> offsets in place of the scores. >>>> >>>>>>> >>>>>> Hi Alan, Simon started some initial work on >>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>> >>>>>> Some work and prototypes were done in a branch, but it might be >>>>>> lagging behind trunk a bit. >>>>>> >>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>> support offsets in the postings lists. >>>>>> We've since added this and several codecs support it. >>>>>> >>>>>> -- >>>>>> lucidimagination.com >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >>>> commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org