That would be great, thanks!  I had a go at merging it last night, but there 
are a *lot* of changes that I haven't got my head round yet, so it was getting 
pretty messy.

On 21 Mar 2012, at 08:49, Simon Willnauer wrote:

> Alan, if you want I can just merge the branch up next week and we
> iterate from there?
> 
> simon
> 
> On Tue, Mar 20, 2012 at 12:34 PM, Erick Erickson
> <erickerick...@gmail.com> wrote:
>> Yep, the first challenge is always getting the old patch(es) to apply.....
>> 
>> On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward
>> <alan.woodw...@romseysoftware.co.uk> wrote:
>>> Thanks for all the offers of help!  It looks as though most of the hard 
>>> work has already been done, which is exactly where I like to pick up 
>>> projects.  :-)
>>> 
>>> Maybe the best place to start would be for me to rebase the branch against 
>>> trunk, and see what still fits?  I think there have been some fairly major 
>>> changes in the internals since July last year.
>>> 
>>> On 19 Mar 2012, at 17:07, Mike Sokolov wrote:
>>> 
>>>> I posted a patch with a Collector somewhat similar to what you described, 
>>>> Alan - it's attached to one of the sub-issues 
>>>> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
>>>> complete "alpha" state, but has seen no production use of course, since it 
>>>> relies on the remainder of the unfinished work in that branch.  It works 
>>>> by creating a TokenStream based on match positions returned from the query 
>>>> and passing that to the existing Highlighter.  Please feel free to get in 
>>>> touch if you decide to look into that and have questions.
>>>> 
>>>> 
>>>> -Mike
>>>> 
>>>> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>>>>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  wrote:
>>>>> 
>>>>>> Have you marked that for GSOC? Would be a good idea!
>>>>>> 
>>>>>  yes I did
>>>>> 
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: u...@thetaphi.de
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>>>>> To: dev@lucene.apache.org
>>>>>>> Subject: Re: Using term offsets for hit highlighting
>>>>>>> 
>>>>>>> Alan, you made my day!
>>>>>>> 
>>>>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>>>>> certainly help
>>>>>>> to get it up to speed. The feature in that branch is quite a big one 
>>>>>>> and its in a
>>>>>>> very early stage. Still I want to encourage you to take a look and work 
>>>>>>> on it. I
>>>>>>> promise all my help with the issues!
>>>>>>> 
>>>>>>> let me know if you have questions!
>>>>>>> 
>>>>>>> simon
>>>>>>> 
>>>>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>> 
>>>>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>>>>> 
>>>>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> The project I'm currently working on requires the reporting of exact
>>>>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>>>>> covered by the existing highlighter modules.  I'm working round this
>>>>>>>>>> by translating everything into SpanQueries, and using the getSpans()
>>>>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>>>>> term offsets available - see
>>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>>>>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>>>>> applicable to
>>>>>>>>>> 
>>>>>>> non-Span queries.
>>>>>>> 
>>>>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a couple
>>>>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>>>>> essentially run the query again, with a filter on just the documents
>>>>>>>>>> you want highlighted, and have a custom collector that gets the term
>>>>>>>>>> 
>>>>>>> offsets in place of the scores.
>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> Hi Alan, Simon started some initial work on
>>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>>>>> 
>>>>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>>>>> lagging behind trunk a bit.
>>>>>>>>> 
>>>>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>>>>> support offsets in the postings lists.
>>>>>>>>> We've since added this and several codecs support it.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> lucidimagination.com
>>>>>>>>> 
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>>>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>>>> 
>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to