It must be time to eat lunch, since the more I stare at this code, the less
sense it makes to me. Which is a sure sign that I need a break G.
But a couple of things.
1 my test cases throw some exceptions with the code as-is. The spans.get(0)
is a problem in that it's not guaranteed that the
1 my test cases throw some exceptions with the code as-is. The
spans.get(0)
is a problem in that it's not guaranteed that the spans returned will
have
anything in them. Also, I don't think that the test for
reqSpans.get(0).next
in queryClauses[i].isRequired is correct (even if it doesn't
That may be the difference then. I'm actually working with both a complete
index and a memory index, depending on what phase I'm in. It turns out that
I probably can't put the document in a memoryindex on the fly
because...well...because G... That said, though, I can pretty easily use
this as a
I hope you're all following this old thread, because I've just run into
something I don't quite know what to do about with the SpansExtractor code
that I shamelessly stole.
Let's say my text is a b c d e f g h and my query is a AND z. The
implementation I stole for SpansExtractor (mentioned
Good catch Erick! I'll have to tackle this as well. Mark H is the
originator of that code so maybe he will chime in, but what I am think
is this:
In the getSpansFromBooleanquery, keep track of which clauses are
required. Then based on if any Spans are actually returned from
getSpansFromTerm
Mark:
Thanks, that reassures me that I'm not hallucinating. If it gets on my
priority list I can certainly share the code, since I stole it in the first
place G. I have a semi-solution for now that gets me out from under the
immediate problem, but it really wants a more robust solution than the
Here is my initial attempt...I believe it might be sufficient:
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.PhraseQuery;
import
Excellent! I'll give it a whirl in the morning. This may keep me from having
to rebuild my index as well, oh joy!
Thanks
Erick
On 2/15/07, Mark Miller [EMAIL PROTECTED] wrote:
Here is my initial attempt...I believe it might be sufficient:
import org.apache.lucene.index.IndexReader;
import
I have been away from this for a week, but my interest has started
building again. The whole spans implementation seems to work great for
finding the actual hits but there is a somewhat annoying limitation:
because I am using Spans it seems I can only either highlight the entire
found span or
a new SpansBasedHighlighter.
Cheers,
Mark
- Original Message
From: Mark Miller [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Friday, 2 February, 2007 3:58:01 PM
Subject: Re: Multiword Highlighting
I have been away from this for a week, but my interest has started
building again
mark harwood wrote:
Hi Mark,
Have you looked at the returned spans from any other potential problem scenarios (other than the 3
word one you suggest) e.g. complex nested SpanOr or SpanNot logic?
Nothing super intense, but I haved look at some semi complex nesting and
it all looks great if
For what it's worth Mark (Miller), there *is* a need for just
highlight the query terms without trying to get excerpts functionality
- something a la Google cache (different colours...mmm, nice).
FWIW, the existing highlighter doesn't *have* to fragment - just pass a
NullFragmenter to the
I do use the NullFragmenter now. I have no interest in the fragments at
the moment, just in showing hits on the source document. It would be
great if I could just show the real hits though. The span approach seems
to work fine for me. I have even tested the highlighting using my
sentence and
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? Isn't it you that suggested
turning the query into a SpanQuery, extracting the spans and then doing
the highlighting after a rewrite? This seems somewhat trivial so what am
I missing?
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in SERPS. I would expect that
markharw00d wrote:
Isn't it semi trivial if you are not interested in the fragments (I
swear it seems that most people are not)? I
I haven't conducted a survey but it's the typical web search engine
scenario - select only a small subset of the matching document content
for display in
Maybe a new highlighter with no attempt at summarising could more
easily address phrase support for small pieces of content. It will
always be hard to faithfully represent all possible query match logic
- especially if there are NOTs, ANDs and ORs mixed in with all the
term proximity
contrib to
contrib/ if you end up working on this.
Otis
--
Simpy -- http://www.simpy.com/ -- Tag. Search. Share.
- Original Message
From: Mark Miller [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Sunday, January 28, 2007 7:39:29 AM
Subject: Re: Multiword Highlighting
Hi,
I'm wondering what the best way is to do highlighting of multiword phrases.
For example, if a search is for president kennedy, how can I make sure
that president is only highlighted if it is next to kennedy and
president in president clinton is not.
I haven't figured out where in the process
This is a deficiency in the highlighter functionality that has been
discussed several times before. The summary is - not a trivial fix.
See here for background:
http://marc2.theaimsgroup.com/?l=lucene-userm=114631181214303w=1
20 matches
Mail list logo