alan, I merged the branch manually and created a new branch from it. its here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-2878 the branch compiles but lots of nocommits / todos
if you have questions please ask I will help as much as I can simon On Tue, May 22, 2012 at 8:38 PM, Alan Woodward <[email protected]> wrote: > Hey, I reckon I can have a decent go at getting the branch updated. Is it > best to work this out as a patch applying to trunk? Any patch that merges in > all the trunk changes to the branch is going to be absolutely massive… > > On 17 May 2012, at 13:15, Simon Willnauer wrote: > >> ok man. I will try to merge up the branch. I tell you this is going to >> be messy and it might not compile but I will make it reasonable so you >> can start. >> >> simon >> >> On Thu, May 17, 2012 at 8:03 AM, Alan Woodward >> <[email protected]> wrote: >>> Sorry for vanishing for so long, life unexpectedly caught up with me... >>> I'm going to have some time to look at this again next week though, if >>> you're interested in picking it up again. >>> >>> On 21 Mar 2012, at 09:02, Alan Woodward wrote: >>> >>>> That would be great, thanks! I had a go at merging it last night, but >>>> there are a *lot* of changes that I haven't got my head round yet, so it >>>> was getting pretty messy. >>>> >>>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote: >>>> >>>>> Alan, if you want I can just merge the branch up next week and we >>>>> iterate from there? >>>>> >>>>> simon >>>>> >>>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson >>>>> <[email protected]> wrote: >>>>>> Yep, the first challenge is always getting the old patch(es) to >>>>>> apply..... >>>>>> >>>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward >>>>>> <[email protected]> wrote: >>>>>>> Thanks for all the offers of help! It looks as though most of the hard >>>>>>> work has already been done, which is exactly where I like to pick up >>>>>>> projects. :-) >>>>>>> >>>>>>> Maybe the best place to start would be for me to rebase the branch >>>>>>> against trunk, and see what still fits? I think there have been some >>>>>>> fairly major changes in the internals since July last year. >>>>>>> >>>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote: >>>>>>> >>>>>>>> I posted a patch with a Collector somewhat similar to what you >>>>>>>> described, Alan - it's attached to one of the sub-issues >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly >>>>>>>> complete "alpha" state, but has seen no production use of course, >>>>>>>> since it relies on the remainder of the unfinished work in that >>>>>>>> branch. It works by creating a TokenStream based on match positions >>>>>>>> returned from the query and passing that to the existing Highlighter. >>>>>>>> Please feel free to get in touch if you decide to look into that and >>>>>>>> have questions. >>>>>>>> >>>>>>>> >>>>>>>> -Mike >>>>>>>> >>>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote: >>>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Have you marked that for GSOC? Would be a good idea! >>>>>>>>>> >>>>>>>>> yes I did >>>>>>>>> >>>>>>>>>> ----- >>>>>>>>>> Uwe Schindler >>>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>>>>>>> http://www.thetaphi.de >>>>>>>>>> eMail: [email protected] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Simon Willnauer [mailto:[email protected]] >>>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM >>>>>>>>>>> To: [email protected] >>>>>>>>>>> Subject: Re: Using term offsets for hit highlighting >>>>>>>>>>> >>>>>>>>>>> Alan, you made my day! >>>>>>>>>>> >>>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can >>>>>>>>>>> certainly help >>>>>>>>>>> to get it up to speed. The feature in that branch is quite a big >>>>>>>>>>> one and its in a >>>>>>>>>>> very early stage. Still I want to encourage you to take a look and >>>>>>>>>>> work on it. I >>>>>>>>>>> promise all my help with the issues! >>>>>>>>>>> >>>>>>>>>>> let me know if you have questions! >>>>>>>>>>> >>>>>>>>>>> simon >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Cool, thanks Robert. I'll take a look at the JIRA ticket. >>>>>>>>>>>> >>>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward >>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The project I'm currently working on requires the reporting of >>>>>>>>>>>>>> exact >>>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which >>>>>>>>>>>>>> are >>>>>>>>>>>>>> covered by the existing highlighter modules. I'm working round >>>>>>>>>>>>>> this >>>>>>>>>>>>>> by translating everything into SpanQueries, and using the >>>>>>>>>>>>>> getSpans() >>>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make >>>>>>>>>>>>>> term offsets available - see >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826). This works >>>>>>>>>>>>>> for >>>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't >>>>>>>>>>>>>> applicable to >>>>>>>>>>>>>> >>>>>>>>>>> non-Span queries. >>>>>>>>>>> >>>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets >>>>>>>>>>>>>> to >>>>>>>>>>>>>> provide accurate highlighting in Lucene. I'm going to have a >>>>>>>>>>>>>> couple >>>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at >>>>>>>>>>>>>> implementing this. Mainly I'm wondering if there's already been >>>>>>>>>>>>>> thoughts about how to do it. My current thoughts are to somehow >>>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets >>>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd >>>>>>>>>>>>>> essentially run the query again, with a filter on just the >>>>>>>>>>>>>> documents >>>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the >>>>>>>>>>>>>> term >>>>>>>>>>>>>> >>>>>>>>>>> offsets in place of the scores. >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Hi Alan, Simon started some initial work on >>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878 >>>>>>>>>>>>> >>>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be >>>>>>>>>>>>> lagging behind trunk a bit. >>>>>>>>>>>>> >>>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet >>>>>>>>>>>>> support offsets in the postings lists. >>>>>>>>>>>>> We've since added this and several codecs support it. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> lucidimagination.com >>>>>>>>>>>>> >>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>>> additional commands, e-mail: [email protected] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe, e-mail: [email protected] For >>>>>>>>>>> additional >>>>>>>>>>> commands, e-mail: [email protected] >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>> For additional commands, e-mail: [email protected] >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [email protected] >>>>>> For additional commands, e-mail: [email protected] >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
