Yep, the first challenge is always getting the old patch(es) to apply..... On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward <alan.woodw...@romseysoftware.co.uk> wrote: > Thanks for all the offers of help! It looks as though most of the hard work > has already been done, which is exactly where I like to pick up projects. :-) > > Maybe the best place to start would be for me to rebase the branch against > trunk, and see what still fits? I think there have been some fairly major > changes in the internals since July last year. > > On 19 Mar 2012, at 17:07, Mike Sokolov wrote: > >> I posted a patch with a Collector somewhat similar to what you described, >> Alan - it's attached to one of the sub-issues >> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >> complete "alpha" state, but has seen no production use of course, since it >> relies on the remainder of the unfinished work in that branch. It works by >> creating a TokenStream based on match positions returned from the query and >> passing that to the existing Highlighter. Please feel free to get in touch >> if you decide to look into that and have questions. >> >> >> -Mike >> >> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote: >>> >>>> Have you marked that for GSOC? Would be a good idea! >>>> >>> yes I did >>> >>>> ----- >>>> Uwe Schindler >>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>> http://www.thetaphi.de >>>> eMail: u...@thetaphi.de >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>> To: dev@lucene.apache.org >>>>> Subject: Re: Using term offsets for hit highlighting >>>>> >>>>> Alan, you made my day! >>>>> >>>>> The branch is kind of outdated but I looked at it lately and I can >>>>> certainly help >>>>> to get it up to speed. The feature in that branch is quite a big one and >>>>> its in a >>>>> very early stage. Still I want to encourage you to take a look and work >>>>> on it. I >>>>> promise all my help with the issues! >>>>> >>>>> let me know if you have questions! >>>>> >>>>> simon >>>>> >>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>> >>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>> >>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>> >>>>>> >>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> The project I'm currently working on requires the reporting of exact >>>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>>> covered by the existing highlighter modules. I'm working round this >>>>>>>> by translating everything into SpanQueries, and using the getSpans() >>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>> term offsets available - see >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for >>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>> applicable to >>>>>>>> >>>>> non-Span queries. >>>>> >>>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>>> provide accurate highlighting in Lucene. I'm going to have a couple >>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>> essentially run the query again, with a filter on just the documents >>>>>>>> you want highlighted, and have a custom collector that gets the term >>>>>>>> >>>>> offsets in place of the scores. >>>>> >>>>>>>> >>>>>>> Hi Alan, Simon started some initial work on >>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>> >>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>> lagging behind trunk a bit. >>>>>>> >>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>> support offsets in the postings lists. >>>>>>> We've since added this and several codecs support it. >>>>>>> >>>>>>> -- >>>>>>> lucidimagination.com >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional >>>>> commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org