I posted a patch with a Collector somewhat similar to what you
described, Alan - it's attached to one of the sub-issues
https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly
complete "alpha" state, but has seen no production use of course, since
it relies on the remainder of the unfinished work in that branch. It
works by creating a TokenStream based on match positions returned from
the query and passing that to the existing Highlighter. Please feel
free to get in touch if you decide to look into that and have questions.
-Mike
On 03/19/2012 11:51 AM, Simon Willnauer wrote:
On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindler<u...@thetaphi.de> wrote:
Have you marked that for GSOC? Would be a good idea!
yes I did
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-----Original Message-----
From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Monday, March 19, 2012 4:43 PM
To: dev@lucene.apache.org
Subject: Re: Using term offsets for hit highlighting
Alan, you made my day!
The branch is kind of outdated but I looked at it lately and I can certainly
help
to get it up to speed. The feature in that branch is quite a big one and its in
a
very early stage. Still I want to encourage you to take a look and work on it. I
promise all my help with the issues!
let me know if you have questions!
simon
On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward
<alan.woodw...@romseysoftware.co.uk> wrote:
Cool, thanks Robert. I'll take a look at the JIRA ticket.
On 19 Mar 2012, at 14:44, Robert Muir wrote:
On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward
<alan.woodw...@romseysoftware.co.uk> wrote:
Hello,
The project I'm currently working on requires the reporting of exact
hit positions from some pretty hairy queries, not all of which are
covered by the existing highlighter modules. I'm working round this
by translating everything into SpanQueries, and using the getSpans()
method to locate hits (I've extended the Spans interface to make
term offsets available - see
https://issues.apache.org/jira/browse/LUCENE-3826). This works for
our use-case, but isn't terribly efficient, and obviously isn't applicable to
non-Span queries.
I've seen a bit of chatter on the list about using term offsets to
provide accurate highlighting in Lucene. I'm going to have a couple
of weeks free in April, and I thought I might have a go at
implementing this. Mainly I'm wondering if there's already been
thoughts about how to do it. My current thoughts are to somehow
extend the Weight and Scorer interface to make term offsets
available; to get highlights for a given set of documents, you'd
essentially run the query again, with a filter on just the documents
you want highlighted, and have a custom collector that gets the term
offsets in place of the scores.
Hi Alan, Simon started some initial work on
https://issues.apache.org/jira/browse/LUCENE-2878
Some work and prototypes were done in a branch, but it might be
lagging behind trunk a bit.
Additionally at the time it was first done, I think we didn't yet
support offsets in the postings lists.
We've since added this and several codecs support it.
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org