Writing a TokenConcatenateFilter - junk characters appearing on output.

Jithin Fri, 30 Sep 2011 14:12:23 -0700

Hi,
I am trying to write a TokenFilter which just concatenates all the the token
in the input TokenStream.
Issue I am facing is that my filter is outputting certain junk characters in
addition to the concatenated string. I believe this is caused by
StringBuilder.


This is my incrementToken() function

public boolean incrementToken() throws IOException {                        
        //if (!input.incrementToken()) {                                        
            //return false;                                                     
        //}                                                                     
        if (finished) {                                                         
            logger.error("Finished");                                           
            return false;                                                       
        }                                                                       
        logger.error("Starting");                                               
        StringBuilder buffer = new StringBuilder();                             
        int length = 0;                                                         
        while (input.incrementToken()) {                                        
            logger.error(Integer.toString(buffer.length()));                    
            logger.error(buffer.toString());                                    
            if (0 == length) {                                                  
                buffer.append(termAtt.buffer()); 
               length += termAtt.length();                                      
                                                
            } else {                                                            
                buffer.append(" ").append(termAtt.buffer());  
               length += termAtt.length() + 1;                                  
                     
            }                                                                   

        }                                                                       
                                                                                
        logger.error("####### Final");                                          
        logger.error(Integer.toString(buffer.length()));                        
        logger.error(Integer.toString(length));                                 
        logger.error(buffer.toString());                                        
                                                                                
        termAtt.setEmpty().append(buffer);                                      
        offsetAtt.setOffset(0, length);                                         
        finished = true;                                                        
        return true;                                                            
    }


*Output for input tokens booh and good is *

SEVERE: Starting                                                                
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: 0                                                                       
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE:                                                                         
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: 14                                                                      
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: booh                                                                    
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: ####### Final                                                           
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: 29                                                                      
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: 9                                                                       
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: booh good                                                               
                                                                                
                                          
Sep 30, 2011 9:02:13 PM org.ctown.solr.analysis.CTConcatFilter
incrementToken                                                                  
                                                           
SEVERE: Finished        


And this is it is appearing on solr analysis
page.(http://localhost:8983/solr/admin/analysis.jsp)
org.ctown.solr.analysis.CTConcatFilterFactory {luceneMatchVersion=LUCENE_34}
position        1
*term text      booh#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;
good#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;*
startOffset     0
endOffset       9

Kindlt help me in understanding what I am doing wrong and how to fix this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3383684.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Writing a TokenConcatenateFilter - junk characters appearing on output.

Reply via email to