Alan, if you want I can just merge the branch up next week and we
iterate from there?

simon

On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
<erickerick...@gmail.com> wrote:
> Yep, the first challenge is always getting the old patch(es) to apply.....
>
> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
> <alan.woodw...@romseysoftware.co.uk> wrote:
>> Thanks for all the offers of help!  It looks as though most of the hard work 
>> has already been done, which is exactly where I like to pick up projects.  
>> :-)
>>
>> Maybe the best place to start would be for me to rebase the branch against 
>> trunk, and see what still fits?  I think there have been some fairly major 
>> changes in the internals since July last year.
>>
>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>
>>> I posted a patch with a Collector somewhat similar to what you described, 
>>> Alan - it's attached to one of the sub-issues 
>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>>> complete "alpha" state, but has seen no production use of course, since it 
>>> relies on the remainder of the unfinished work in that branch.  It works by 
>>> creating a TokenStream based on match positions returned from the query and 
>>> passing that to the existing Highlighter.  Please feel free to get in touch 
>>> if you decide to look into that and have questions.
>>>
>>>
>>> -Mike
>>>
>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  wrote:
>>>>
>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>
>>>>  yes I did
>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>> http://www.thetaphi.de
>>>>> eMail: u...@thetaphi.de
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>> To: dev@lucene.apache.org
>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>
>>>>>> Alan, you made my day!
>>>>>>
>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>> certainly help
>>>>>> to get it up to speed. The feature in that branch is quite a big one and 
>>>>>> its in a
>>>>>> very early stage. Still I want to encourage you to take a look and work 
>>>>>> on it. I
>>>>>> promise all my help with the issues!
>>>>>>
>>>>>> let me know if you have questions!
>>>>>>
>>>>>> simon
>>>>>>
>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>
>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>
>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> The project I'm currently working on requires the reporting of exact
>>>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>>>> covered by the existing highlighter modules.  I'm working round this
>>>>>>>>> by translating everything into SpanQueries, and using the getSpans()
>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>> term offsets available - see
>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>> applicable to
>>>>>>>>>
>>>>>> non-Span queries.
>>>>>>
>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a couple
>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>> essentially run the query again, with a filter on just the documents
>>>>>>>>> you want highlighted, and have a custom collector that gets the term
>>>>>>>>>
>>>>>> offsets in place of the scores.
>>>>>>
>>>>>>>>>
>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>
>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>> lagging behind trunk a bit.
>>>>>>>>
>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>> support offsets in the postings lists.
>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>
>>>>>>>> --
>>>>>>>> lucidimagination.com
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to