RE: Re:RE: About highlighter

Pierre GOSSE Fri, 18 Mar 2011 01:28:03 -0700

For your field configuration, the TokenStream you get with getAnyTokenStream is 
built from TermVectors.

What tokenizer do you use for populating your field ? Have you check with luke 
that your term vectors are Ok ?

And what version of lucene ? A change was made on this code recently, for 
another issue (apparently unrelated, but who knows ?) See 
https://issues.apache.org/jira/browse/LUCENE-2874

Pierre

De : Cescy [mailto:ee07b...@gmail.com]
Envoyé : vendredi 18 mars 2011 07:32
À : java-user; Pierre GOSSE
Objet : Re:RE: About highlighter

 Yes, I only search the "contents" field. And I can print the whole contents by 
doc.get("contents") if there are any keywords in it. And if the number of words 
is too large, it is cannot highlight the keywords at end part of the contents, 
as if highlight have a word limitation.

document.add( new Field( "contens", value, Field.Store.YES, 
Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS ) );

 Thx
 Gong

------------------ Original ------------------
From:  "Pierre GOSSE"<pierre.go...@arisem.com>;
Date:  Thu, Mar 17, 2011 04:25 PM
To:  "java-user@lucene.apache.org"<java-user@lucene.apache.org>;
Subject:  RE: About highlighter

500 is the max size of text fragments to be returned by highlight. It shouldn't 
be the problem here, as far as I understand highlight.

Gong li, how is defined the field "contents" ? Is it the only field on which 
the search is made ?

Pierre

-----Message d'origine-----
De : Ian Lea [mailto:ian....@gmail.com]
Envoyé : mercredi 16 mars 2011 22:29
�� : java-user@lucene.apache.org
Objet : Re: About highlighter

I know nothing about highlighting but that 500 looks like a good place
to start investigating.

--
Ian.

On Tue, Mar 15, 2011 at 8:47 PM, Cescy <ee07b...@gmail.com> wrote:
> Hi,
>
>
> My highlight code is shown as following:
>
>
>  QueryScorer scorer = new QueryScorer(query);
>  Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer);
>  highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));
>  String contents = doc.get("contents");
>  TokenStream tokenStream = 
> TokenSources.getAnyTokenStream(searcher.getIndexReader(), 
> topDocs.scoreDocs[i].doc, "contents", doc, analyzer);
>  String[] snippet = highlighter.getBestFragments(tokenStream, contents, 10);
>
>
>
> snippet is the result contexts and then I will print out them on the screen.
> But If I may search for a keyword at the last few paragraph and the essay is 
> too long (1000-2000 words), it will return "document found" and 
> snippet..length=0 (i.e. document is found but context is NOT found). Why???
>
>
> How could I fix the problem?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Re:RE: About highlighter

Reply via email to