Hi, I am trying to write a TokenFilter which just concatenates all the the token in the input TokenStream. Issue I am facing is that my filter is outputting certain junk characters in addition to the concatenated string. I believe this is caused by StringBuilder.
This is my incrementToken() function public boolean incrementToken() throws IOException { //if (!input.incrementToken()) { //return false; //} if (finished) { logger.error("Finished"); return false; } logger.error("Starting"); StringBuilder buffer = new StringBuilder(); int length = 0; while (input.incrementToken()) { logger.error(Integer.toString(buffer.length())); logger.error(buffer.toString()); if (0 == length) { buffer.append(termAtt.buffer()); length += termAtt.length(); } else { buffer.append(" ").append(termAtt.buffer()); length += termAtt.length() + 1; } } logger.error("####### Final"); logger.error(Integer.toString(buffer.length())); logger.error(Integer.toString(length)); logger.error(buffer.toString()); termAtt.setEmpty().append(buffer); offsetAtt.setOffset(0, length); finished = true; return true; } *Output for input tokens booh and good is * SEVERE: Starting Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: 0 Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: 14 Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: booh Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: ####### Final Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: 29 Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: 9 Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: booh good Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter incrementToken SEVERE: Finished And this is it is appearing on solr analysis page.(http://localhost:8983/solr/admin/analysis.jsp) org.ctown.solr.analysis.CTConcatFilterFactory {luceneMatchVersion=LUCENE_34} position 1 *term text booh#0;#0;#0;#0;#0;#0;#0;#0;#0;#0; good#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;* startOffset 0 endOffset 9 Kindlt help me in understanding what I am doing wrong and how to fix this. -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3383684.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org