That would be great, thanks! I had a go at merging it last night, but there are a *lot* of changes that I haven't got my head round yet, so it was getting pretty messy.
On 21 Mar 2012, at 08:49, Simon Willnauer wrote: > Alan, if you want I can just merge the branch up next week and we > iterate from there? > > simon > > On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson > <erickerick...@gmail.com> wrote: >> Yep, the first challenge is always getting the old patch(es) to apply..... >> >> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >> <alan.woodw...@romseysoftware.co.uk> wrote: >>> Thanks for all the offers of help! It looks as though most of the hard >>> work has already been done, which is exactly where I like to pick up >>> projects. :-) >>> >>> Maybe the best place to start would be for me to rebase the branch against >>> trunk, and see what still fits? I think there have been some fairly major >>> changes in the internals since July last year. >>> >>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>> >>>> I posted a patch with a Collector somewhat similar to what you described, >>>> Alan - it's attached to one of the sub-issues >>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >>>> complete "alpha" state, but has seen no production use of course, since it >>>> relies on the remainder of the unfinished work in that branch. It works >>>> by creating a TokenStream based on match positions returned from the query >>>> and passing that to the existing Highlighter. Please feel free to get in >>>> touch if you decide to look into that and have questions. >>>> >>>> >>>> -Mike >>>> >>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote: >>>>> >>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>> >>>>> yes I did >>>>> >>>>>> ----- >>>>>> Uwe Schindler >>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>> http://www.thetaphi.de >>>>>> eMail: u...@thetaphi.de >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>> To: dev@lucene.apache.org >>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>> >>>>>>> Alan, you made my day! >>>>>>> >>>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>>> certainly help >>>>>>> to get it up to speed. The feature in that branch is quite a big one >>>>>>> and its in a >>>>>>> very early stage. Still I want to encourage you to take a look and work >>>>>>> on it. I >>>>>>> promise all my help with the issues! >>>>>>> >>>>>>> let me know if you have questions! >>>>>>> >>>>>>> simon >>>>>>> >>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>> >>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>> >>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> The project I'm currently working on requires the reporting of exact >>>>>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>>>>> covered by the existing highlighter modules. I'm working round this >>>>>>>>>> by translating everything into SpanQueries, and using the getSpans() >>>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>>> term offsets available - see >>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for >>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>>> applicable to >>>>>>>>>> >>>>>>> non-Span queries. >>>>>>> >>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a couple >>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>>> essentially run the query again, with a filter on just the documents >>>>>>>>>> you want highlighted, and have a custom collector that gets the term >>>>>>>>>> >>>>>>> offsets in place of the scores. >>>>>>> >>>>>>>>>> >>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>> >>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>> lagging behind trunk a bit. >>>>>>>>> >>>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>>> support offsets in the postings lists. >>>>>>>>> We've since added this and several codecs support it. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> lucidimagination.com >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >>>>>>> commands, e-mail: dev-h...@lucene.apache.org >>>>>>> >>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org