Thanks a million Uwe. That fixes it. On Sat, Oct 1, 2011 at 4:16 AM, Uwe Schindler [via Lucene] < ml-node+s472066n3383905...@n3.nabble.com> wrote:
> Hi, > > The junk is appended here: buffer.append(termAtt.buffer()); > > I assume you are on Lucene 3.1+, so use buffer.append(termAtt); termAtt > implements CharSequence, so it can be appended to any StringBuilder. > The code you are using appends the whole char array, which may contain > characters after termAtt.length(). > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=0> > > > -----Original Message----- > > From: Jithin [mailto:[hidden > > email]<http://user/SendEmail.jtp?type=node&node=3383905&i=1>] > > > Sent: Friday, September 30, 2011 11:12 PM > > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=2> > > Subject: Writing a TokenConcatenateFilter - junk characters appearing on > > output. > > > > Hi, > > I am trying to write a TokenFilter which just concatenates all the the > token in > > the input TokenStream. > > Issue I am facing is that my filter is outputting certain junk characters > > in > > addition to the concatenated string. I believe this is caused by > StringBuilder. > > > > > This is my incrementToken() function > > > > public boolean incrementToken() throws IOException { > > //if (!input.incrementToken()) { > > //return false; > > //} > > if (finished) { > > logger.error("Finished"); > > return false; > > } > > logger.error("Starting"); > > StringBuilder buffer = new StringBuilder(); > > int length = 0; > > while (input.incrementToken()) { > > logger.error(Integer.toString(buffer.length())); > > logger.error(buffer.toString()); > > if (0 == length) { > > buffer.append(termAtt.buffer()); > > length += termAtt.length(); > > } else { > > buffer.append(" ").append(termAtt.buffer()); > > length += termAtt.length() + 1; > > } > > > > } > > > > logger.error("####### Final"); > > logger.error(Integer.toString(buffer.length())); > > logger.error(Integer.toString(length)); > > logger.error(buffer.toString()); > > > > termAtt.setEmpty().append(buffer); > > offsetAtt.setOffset(0, length); > > finished = true; > > return true; > > } > > > > > > *Output for input tokens booh and good is * > > > > SEVERE: Starting > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: 0 > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: 14 > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: booh > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: ####### Final > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: 29 > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: 9 > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: booh good > > Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter > > incrementToken > > SEVERE: Finished > > > > > > And this is it is appearing on solr analysis > > page.(http://localhost:8983/solr/admin/analysis.jsp) > > org.ctown.solr.analysis.CTConcatFilterFactory > > {luceneMatchVersion=LUCENE_34} > > position 1 > > *term text booh#0;#0;#0;#0;#0;#0;#0;#0;#0;#0; > > good#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;* > > startOffset 0 > > endOffset 9 > > > > Kindlt help me in understanding what I am doing wrong and how to fix > this. > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Writing-a- > > TokenConcatenateFilter-junk-characters-appearing-on-output- > > tp3383684p3383684.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden > > email]<http://user/SendEmail.jtp?type=node&node=3383905&i=3> > > For additional commands, e-mail: [hidden > > email]<http://user/SendEmail.jtp?type=node&node=3383905&i=4> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden > email]<http://user/SendEmail.jtp?type=node&node=3383905&i=5> > For additional commands, e-mail: [hidden > email]<http://user/SendEmail.jtp?type=node&node=3383905&i=6> > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3383905.html > To unsubscribe from Writing a TokenConcatenateFilter - junk characters > appearing on output., click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3383684&code=aml0aGluMTk4N0BnbWFpbC5jb218MzM4MzY4NHwtMTEwMTgwMTA3Ng==>. > > -- Thanks Jithin Emmanuel -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384323.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.