Hi, I am trying to write a TokenFilter which just concatenates all the the token in the input TokenStream. Issue I am facing is that my filter is outputting certain junk characters in addition to the concatenated string. I believe this is caused by StringBuilder.
This is my incrementToken() function
public boolean incrementToken() throws IOException {
//if (!input.incrementToken()) {
//return false;
//}
if (finished) {
logger.error("Finished");
return false;
}
logger.error("Starting");
StringBuilder buffer = new StringBuilder();
int length = 0;
while (input.incrementToken()) {
logger.error(Integer.toString(buffer.length()));
logger.error(buffer.toString());
if (0 == length) {
buffer.append(termAtt.buffer());
length += termAtt.length();
} else {
buffer.append(" ").append(termAtt.buffer());
length += termAtt.length() + 1;
}
}
logger.error("####### Final");
logger.error(Integer.toString(buffer.length()));
logger.error(Integer.toString(length));
logger.error(buffer.toString());
termAtt.setEmpty().append(buffer);
offsetAtt.setOffset(0, length);
finished = true;
return true;
}
*Output for input tokens booh and good is *
SEVERE: Starting
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: 0
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE:
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: 14
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: booh
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: ####### Final
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: 29
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: 9
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: booh good
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken
SEVERE: Finished
And this is it is appearing on solr analysis
page.(http://localhost:8983/solr/admin/analysis.jsp)
org.ctown.solr.analysis.CTConcatFilterFactory {luceneMatchVersion=LUCENE_34}
position 1
*term text booh#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;
good#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;*
startOffset 0
endOffset 9
Kindlt help me in understanding what I am doing wrong and how to fix this.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3383684.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
