For your field configuration, the TokenStream you get with getAnyTokenStream is built from TermVectors.
What tokenizer do you use for populating your field ? Have you check with luke that your term vectors are Ok ? And what version of lucene ? A change was made on this code recently, for another issue (apparently unrelated, but who knows ?) See https://issues.apache.org/jira/browse/LUCENE-2874 Pierre De : Cescy [mailto:ee07b...@gmail.com] Envoyé : vendredi 18 mars 2011 07:32 À : java-user; Pierre GOSSE Objet : Re:RE: About highlighter Yes, I only search the "contents" field. And I can print the whole contents by doc.get("contents") if there are any keywords in it. And if the number of words is too large, it is cannot highlight the keywords at end part of the contents, as if highlight have a word limitation. document.add( new Field( "contens", value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS ) ); Thx Gong ------------------ Original ------------------ From: "Pierre GOSSE"<pierre.go...@arisem.com>; Date: Thu, Mar 17, 2011 04:25 PM To: "java-user@lucene.apache.org"<java-user@lucene.apache.org>; Subject: RE: About highlighter 500 is the max size of text fragments to be returned by highlight. It shouldn't be the problem here, as far as I understand highlight. Gong li, how is defined the field "contents" ? Is it the only field on which the search is made ? Pierre -----Message d'origine----- De : Ian Lea [mailto:ian....@gmail.com] Envoyé : mercredi 16 mars 2011 22:29 �� : java-user@lucene.apache.org Objet : Re: About highlighter I know nothing about highlighting but that 500 looks like a good place to start investigating. -- Ian. On Tue, Mar 15, 2011 at 8:47 PM, Cescy <ee07b...@gmail.com> wrote: > Hi, > > > My highlight code is shown as following: > > > QueryScorer scorer = new QueryScorer(query); > Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer); > highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500)); > String contents = doc.get("contents"); > TokenStream tokenStream = > TokenSources.getAnyTokenStream(searcher.getIndexReader(), > topDocs.scoreDocs[i].doc, "contents", doc, analyzer); > String[] snippet = highlighter.getBestFragments(tokenStream, contents, 10); > > > > snippet is the result contexts and then I will print out them on the screen. > But If I may search for a keyword at the last few paragraph and the essay is > too long (1000-2000 words), it will return "document found" and > snippet..length=0 (i.e. document is found but context is NOT found). Why??? > > > How could I fix the problem? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org