Also, there's a default of 10,000 tokens per field at index time....
Erick On 2/9/07, mark harwood <[EMAIL PROTECTED]> wrote:
See Highlighter.setMaxDocBytesToAnalyze(int byteCount) It's default setting is limited in order to avoid excessive response times. Cheers Mark ----- Original Message ---- From: Fred Eaker <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 9 February, 2007 4:28:36 PM Subject: Highlighter returning incomplete field text Is there a limit to how many characters a Highlighter or NullFragmenter will return? I have indexed an entire HTML document (145kb). When I use the highlighter with a NullFragmenter, the getBestFragment and getBestFragments methods return the text of the field up to 51316 characters. I have tried indexing other HTML documents as well, but get the same results. If I change the Highlighter's Encoder to DefaultEncoder, I get more characters, but not the entire field. Here is some code: Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), new DefaultEncoder(), new QueryScorer(query)); highlighter.setTextFragmenter(new NullFragmenter()); TokenStream tokenStream = LuceneUtils.getAnalyzer().tokenStream( fieldName, new StringReader(hit.get(fieldName))); String highlightedHit = highlighter.getBestFragment(tokenStream, hit.get(fieldName)); --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___________________________________________________________ The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]