Alan, if you want I can just merge the branch up next week and we iterate from there?
simon On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Yep, the first challenge is always getting the old patch(es) to apply..... > > On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward > <alan.woodw...@romseysoftware.co.uk> wrote: >> Thanks for all the offers of help! It looks as though most of the hard work >> has already been done, which is exactly where I like to pick up projects. >> :-) >> >> Maybe the best place to start would be for me to rebase the branch against >> trunk, and see what still fits? I think there have been some fairly major >> changes in the internals since July last year. >> >> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >> >>> I posted a patch with a Collector somewhat similar to what you described, >>> Alan - it's attached to one of the sub-issues >>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >>> complete "alpha" state, but has seen no production use of course, since it >>> relies on the remainder of the unfinished work in that branch. It works by >>> creating a TokenStream based on match positions returned from the query and >>> passing that to the existing Highlighter. Please feel free to get in touch >>> if you decide to look into that and have questions. >>> >>> >>> -Mike >>> >>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote: >>>> >>>>> Have you marked that for GSOC? Would be a good idea! >>>>> >>>> yes I did >>>> >>>>> ----- >>>>> Uwe Schindler >>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>> http://www.thetaphi.de >>>>> eMail: u...@thetaphi.de >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>> To: dev@lucene.apache.org >>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>> >>>>>> Alan, you made my day! >>>>>> >>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>> certainly help >>>>>> to get it up to speed. The feature in that branch is quite a big one and >>>>>> its in a >>>>>> very early stage. Still I want to encourage you to take a look and work >>>>>> on it. I >>>>>> promise all my help with the issues! >>>>>> >>>>>> let me know if you have questions! >>>>>> >>>>>> simon >>>>>> >>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>> >>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>> >>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>> >>>>>>> >>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> The project I'm currently working on requires the reporting of exact >>>>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>>>> covered by the existing highlighter modules. I'm working round this >>>>>>>>> by translating everything into SpanQueries, and using the getSpans() >>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>> term offsets available - see >>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for >>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>> applicable to >>>>>>>>> >>>>>> non-Span queries. >>>>>> >>>>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a couple >>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>> essentially run the query again, with a filter on just the documents >>>>>>>>> you want highlighted, and have a custom collector that gets the term >>>>>>>>> >>>>>> offsets in place of the scores. >>>>>> >>>>>>>>> >>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>> >>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>> lagging behind trunk a bit. >>>>>>>> >>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>> support offsets in the postings lists. >>>>>>>> We've since added this and several codecs support it. >>>>>>>> >>>>>>>> -- >>>>>>>> lucidimagination.com >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >>>>>> commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org