Yep, the first challenge is always getting the old patch(es) to apply.....

On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
<alan.woodw...@romseysoftware.co.uk> wrote:
> Thanks for all the offers of help!  It looks as though most of the hard work 
> has already been done, which is exactly where I like to pick up projects.  :-)
>
> Maybe the best place to start would be for me to rebase the branch against 
> trunk, and see what still fits?  I think there have been some fairly major 
> changes in the internals since July last year.
>
> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>
>> I posted a patch with a Collector somewhat similar to what you described, 
>> Alan - it's attached to one of the sub-issues 
>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>> complete "alpha" state, but has seen no production use of course, since it 
>> relies on the remainder of the unfinished work in that branch.  It works by 
>> creating a TokenStream based on match positions returned from the query and 
>> passing that to the existing Highlighter.  Please feel free to get in touch 
>> if you decide to look into that and have questions.
>>
>>
>> -Mike
>>
>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  wrote:
>>>
>>>> Have you marked that for GSOC? Would be a good idea!
>>>>
>>>  yes I did
>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: u...@thetaphi.de
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>> To: dev@lucene.apache.org
>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>
>>>>> Alan, you made my day!
>>>>>
>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>> certainly help
>>>>> to get it up to speed. The feature in that branch is quite a big one and 
>>>>> its in a
>>>>> very early stage. Still I want to encourage you to take a look and work 
>>>>> on it. I
>>>>> promise all my help with the issues!
>>>>>
>>>>> let me know if you have questions!
>>>>>
>>>>> simon
>>>>>
>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>
>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>
>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>
>>>>>>
>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> The project I'm currently working on requires the reporting of exact
>>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>>> covered by the existing highlighter modules.  I'm working round this
>>>>>>>> by translating everything into SpanQueries, and using the getSpans()
>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>> term offsets available - see
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>> applicable to
>>>>>>>>
>>>>> non-Span queries.
>>>>>
>>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a couple
>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>> essentially run the query again, with a filter on just the documents
>>>>>>>> you want highlighted, and have a custom collector that gets the term
>>>>>>>>
>>>>> offsets in place of the scores.
>>>>>
>>>>>>>>
>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>
>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>> lagging behind trunk a bit.
>>>>>>>
>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>> support offsets in the postings lists.
>>>>>>> We've since added this and several codecs support it.
>>>>>>>
>>>>>>> --
>>>>>>> lucidimagination.com
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to