ok man. I will try to merge up the branch. I tell you this is going to be messy and it might not compile but I will make it reasonable so you can start.
simon On Thu, May 17, 2012 at 8:03 AM, Alan Woodward <alan.woodw...@romseysoftware.co.uk> wrote: > Sorry for vanishing for so long, life unexpectedly caught up with me... I'm > going to have some time to look at this again next week though, if you're > interested in picking it up again. > > On 21 Mar 2012, at 09:02, Alan Woodward wrote: > >> That would be great, thanks! I had a go at merging it last night, but there >> are a *lot* of changes that I haven't got my head round yet, so it was >> getting pretty messy. >> >> On 21 Mar 2012, at 08:49, Simon Willnauer wrote: >> >>> Alan, if you want I can just merge the branch up next week and we >>> iterate from there? >>> >>> simon >>> >>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson >>> <erickerick...@gmail.com> wrote: >>>> Yep, the first challenge is always getting the old patch(es) to apply..... >>>> >>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>> Thanks for all the offers of help! It looks as though most of the hard >>>>> work has already been done, which is exactly where I like to pick up >>>>> projects. :-) >>>>> >>>>> Maybe the best place to start would be for me to rebase the branch >>>>> against trunk, and see what still fits? I think there have been some >>>>> fairly major changes in the internals since July last year. >>>>> >>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>>>> >>>>>> I posted a patch with a Collector somewhat similar to what you >>>>>> described, Alan - it's attached to one of the sub-issues >>>>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >>>>>> complete "alpha" state, but has seen no production use of course, since >>>>>> it relies on the remainder of the unfinished work in that branch. It >>>>>> works by creating a TokenStream based on match positions returned from >>>>>> the query and passing that to the existing Highlighter. Please feel >>>>>> free to get in touch if you decide to look into that and have questions. >>>>>> >>>>>> >>>>>> -Mike >>>>>> >>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote: >>>>>>> >>>>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>>>> >>>>>>> yes I did >>>>>>> >>>>>>>> ----- >>>>>>>> Uwe Schindler >>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>>>> http://www.thetaphi.de >>>>>>>> eMail: u...@thetaphi.de >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>>>> To: dev@lucene.apache.org >>>>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>>>> >>>>>>>>> Alan, you made my day! >>>>>>>>> >>>>>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>>>>> certainly help >>>>>>>>> to get it up to speed. The feature in that branch is quite a big one >>>>>>>>> and its in a >>>>>>>>> very early stage. Still I want to encourage you to take a look and >>>>>>>>> work on it. I >>>>>>>>> promise all my help with the issues! >>>>>>>>> >>>>>>>>> let me know if you have questions! >>>>>>>>> >>>>>>>>> simon >>>>>>>>> >>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>>> >>>>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>>>> >>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> The project I'm currently working on requires the reporting of >>>>>>>>>>>> exact >>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which are >>>>>>>>>>>> covered by the existing highlighter modules. I'm working round >>>>>>>>>>>> this >>>>>>>>>>>> by translating everything into SpanQueries, and using the >>>>>>>>>>>> getSpans() >>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>>>>> term offsets available - see >>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works for >>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>>>>> applicable to >>>>>>>>>>>> >>>>>>>>> non-Span queries. >>>>>>>>> >>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to >>>>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a >>>>>>>>>>>> couple >>>>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>>>>> essentially run the query again, with a filter on just the >>>>>>>>>>>> documents >>>>>>>>>>>> you want highlighted, and have a custom collector that gets the >>>>>>>>>>>> term >>>>>>>>>>>> >>>>>>>>> offsets in place of the scores. >>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>>>> >>>>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>>>> lagging behind trunk a bit. >>>>>>>>>>> >>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>>>>> support offsets in the postings lists. >>>>>>>>>>> We've since added this and several codecs support it. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> lucidimagination.com >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >>>>>>>>> additional >>>>>>>>> commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>> >>>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org