Thanks for all the offers of help!  It looks as though most of the hard work 
has already been done, which is exactly where I like to pick up projects.  :-)

Maybe the best place to start would be for me to rebase the branch against 
trunk, and see what still fits?  I think there have been some fairly major 
changes in the internals since July last year.

On 19 Mar 2012, at 17:07, Mike Sokolov wrote:

> I posted a patch with a Collector somewhat similar to what you described, 
> Alan - it's attached to one of the sub-issues 
> https://issues.apache.org/jira/browse/LUCENE-3318.   It is in a fairly 
> complete "alpha" state, but has seen no production use of course, since it 
> relies on the remainder of the unfinished work in that branch.  It works by 
> creating a TokenStream based on match positions returned from the query and 
> passing that to the existing Highlighter.  Please feel free to get in touch 
> if you decide to look into that and have questions.
> 
> 
> -Mike
> 
> On 03/19/2012 11:51 AM, Simon Willnauer wrote:
>> On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de>  wrote:
>>   
>>> Have you marked that for GSOC? Would be a good idea!
>>>     
>>  yes I did
>>   
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: u...@thetaphi.de
>>> 
>>> 
>>>     
>>>> -----Original Message-----
>>>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
>>>> Sent: Monday, March 19, 2012 4:43 PM
>>>> To: dev@lucene.apache.org
>>>> Subject: Re: Using term offsets for hit highlighting
>>>> 
>>>> Alan, you made my day!
>>>> 
>>>> The branch is kind of outdated but I looked at it lately and I can 
>>>> certainly help
>>>> to get it up to speed. The feature in that branch is quite a big one and 
>>>> its in a
>>>> very early stage. Still I want to encourage you to take a look and work on 
>>>> it. I
>>>> promise all my help with the issues!
>>>> 
>>>> let me know if you have questions!
>>>> 
>>>> simon
>>>> 
>>>> On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>       
>>>>> Cool, thanks Robert.  I'll take a look at the JIRA ticket.
>>>>> 
>>>>> On 19 Mar 2012, at 14:44, Robert Muir wrote:
>>>>> 
>>>>>         
>>>>>> On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
>>>>>> <alan.woodw...@romseysoftware.co.uk>  wrote:
>>>>>>           
>>>>>>> Hello,
>>>>>>> 
>>>>>>> The project I'm currently working on requires the reporting of exact
>>>>>>> hit positions from some pretty hairy queries, not all of which are
>>>>>>> covered by the existing highlighter modules.  I'm working round this
>>>>>>> by translating everything into SpanQueries, and using the getSpans()
>>>>>>> method to locate hits (I've extended the Spans interface to make
>>>>>>> term offsets available - see
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-3826).  This works for
>>>>>>> our use-case, but isn't terribly efficient, and obviously isn't 
>>>>>>> applicable to
>>>>>>>             
>>>> non-Span queries.
>>>>       
>>>>>>> I've seen a bit of chatter on the list about using term offsets to
>>>>>>> provide accurate highlighting in Lucene.  I'm going to have a couple
>>>>>>> of weeks free in April, and I thought I might have a go at
>>>>>>> implementing this.  Mainly I'm wondering if there's already been
>>>>>>> thoughts about how to do it.  My current thoughts are to somehow
>>>>>>> extend the Weight and Scorer interface to make term offsets
>>>>>>> available; to get highlights for a given set of documents, you'd
>>>>>>> essentially run the query again, with a filter on just the documents
>>>>>>> you want highlighted, and have a custom collector that gets the term
>>>>>>>             
>>>> offsets in place of the scores.
>>>>       
>>>>>>>             
>>>>>> Hi Alan, Simon started some initial work on
>>>>>> https://issues.apache.org/jira/browse/LUCENE-2878
>>>>>> 
>>>>>> Some work and prototypes were done in a branch, but it might be
>>>>>> lagging behind trunk a bit.
>>>>>> 
>>>>>> Additionally at the time it was first done, I think we didn't yet
>>>>>> support offsets in the postings lists.
>>>>>> We've since added this and several codecs support it.
>>>>>> 
>>>>>> --
>>>>>> lucidimagination.com
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> 
>>>>>>           
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
>>>>> additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>>>         
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>>>> commands, e-mail: dev-h...@lucene.apache.org
>>>>       
>>>     
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to