ok man. I will try to merge up the branch. I tell you this is going to
be messy and it might not compile but I will make it reasonable so you
can start.

simon

On Thu, May 17, 2012 at 8:03 AM, Alan Woodward
<alan.woodw...@romseysoftware.co.uk> wrote:
> Sorry for vanishing for so long, life unexpectedly caught up with me...  I'm 
> going to have some time to look at this again next week though, if you're 
> interested in picking it up again.
>
> On 21 Mar 2012, at 09:02, Alan Woodward wrote:
>
>> That would be great, thanks!  I had a go at merging it last night, but there 
>> are a *lot* of changes that I haven't got my head round yet, so it was 
>> getting pretty messy.
>>
>> On 21 Mar 2012, at 08:49, Simon Willnauer wrote:
>>
>>> Alan, if you want I can just merge the branch up next week and we
>>> iterate from there?
>>>
>>> simon
>>>
>>> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
>>> <erickerick...@gmail.com> wrote:
>>>> Yep, the first challenge is always getting the old patch(es) to apply.....
>>>>
>>>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
>>>> <alan.woodw...@romseysoftware.co.uk> wrote:
>>>>> Thanks for all the offers of help!  It looks as though most of the hard 
>>>>> work has already been done, which is exactly where I like to pick up 
>>>>> projects.  :-)
>>>>>
>>>>> Maybe the best place to start would be for me to rebase the branch 
>>>>> against trunk, and see what still fits?  I think there have been some 
>>>>> fairly major changes in the internals since July last year.
>>>>>
>>>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>>>>
>>>>>> I posted a patch with a Collector somewhat similar to what you 
>>>>>> described, Alan - it's attached to one of the sub-issues 
>>>>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>>>>>> complete "alpha" state, but has seen no production use of course, since 
>>>>>> it relies on the remainder of the unfinished work in that branch.  It 
>>>>>> works by creating a TokenStream based on match positions returned from 
>>>>>> the query and passing that to the existing Highlighter.  Please feel 
>>>>>> free to get in touch if you decide to look into that and have questions.
>>>>>>
>>>>>>
>>>>>> -Mike
>>>>>>
>>>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  wrote:
>>>>>>>
>>>>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>>>>
>>>>>>> yes I did
>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>>> http://www.thetaphi.de
>>>>>>>> eMail: u...@thetaphi.de
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>>>>> To: dev@lucene.apache.org
>>>>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>>>>
>>>>>>>>> Alan, you made my day!
>>>>>>>>>
>>>>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>>>>> certainly help
>>>>>>>>> to get it up to speed. The feature in that branch is quite a big one 
>>>>>>>>> and its in a
>>>>>>>>> very early stage. Still I want to encourage you to take a look and 
>>>>>>>>> work on it. I
>>>>>>>>> promise all my help with the issues!
>>>>>>>>>
>>>>>>>>> let me know if you have questions!
>>>>>>>>>
>>>>>>>>> simon
>>>>>>>>>
>>>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>
>>>>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>>>>
>>>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> The project I'm currently working on requires the reporting of 
>>>>>>>>>>>> exact
>>>>>>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>>>>>>> covered by the existing highlighter modules.  I'm working round 
>>>>>>>>>>>> this
>>>>>>>>>>>> by translating everything into SpanQueries, and using the 
>>>>>>>>>>>> getSpans()
>>>>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>>>>> term offsets available - see
>>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>>>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>>>>> applicable to
>>>>>>>>>>>>
>>>>>>>>> non-Span queries.
>>>>>>>>>
>>>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a 
>>>>>>>>>>>> couple
>>>>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>>>>> essentially run the query again, with a filter on just the 
>>>>>>>>>>>> documents
>>>>>>>>>>>> you want highlighted, and have a custom collector that gets the 
>>>>>>>>>>>> term
>>>>>>>>>>>>
>>>>>>>>> offsets in place of the scores.
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>>>>
>>>>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>>>>> lagging behind trunk a bit.
>>>>>>>>>>>
>>>>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>>>>> support offsets in the postings lists.
>>>>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> lucidimagination.com
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
>>>>>>>>> additional
>>>>>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to