Hi Mark, Out of curiosity, what was your use case?
Thanks, Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Wed, Oct 31, 2012 at 10:56 PM, Mark Bennett <[email protected]> wrote: > This filter lets you "glue" tokens back together. This has been discussed > and posted on the list before, but this updated version uses all the > preferred 4.x classes. > > Normally you wouldn't want to stick tokens back together, but if you've > found this post, you probably have some atypical need for it (as I did) > As an example you could: > * Let tokenizer break up text on white spaces > * Then lowercase > * then remove stop words > * ***then concatenate all the words back together into one string*** > > You'll need: > * ConcatFilter.java (for lucene, below) > * ConcatFilterFactory.java (for solr, below) > * entry in your schema > > schema.xml entry > ---------- > ... > <fieldType ...> > <analyzer> > ... > <filter class="solr.ConcatFilterFactory" /> > ... > </analyzer> > </fieldType> > ... > > ConcatFilter.java > ----------------- > package org.apache.lucene.analysis; > import java.io.IOException; > import org.apache.lucene.analysis.TokenFilter; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; > public class ConcatFilter extends TokenFilter { > protected CharTermAttribute charTermAttr; > public ConcatFilter(TokenStream input) { > super(input); > charTermAttr = addAttribute( CharTermAttribute.class ); > } > @Override > public boolean incrementToken() throws IOException { > StringBuilder buffer = new StringBuilder(); > while( input.incrementToken() ) { > buffer.append( charTermAttr ); > } > // We need to clear it either way > charTermAttr.setEmpty(); > if ( buffer.length() > 0 ) { > charTermAttr.append( buffer ); > return true; > } > else { > return false; > } > } > } > > ConcatFilterFactory.java > ------------------------ > package org.apache.solr.analysis; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.util.TokenFilterFactory; > public class ConcatFilterFactory extends TokenFilterFactory { > @Override > public TokenStream create(TokenStream stream) { > return new ConcatFilter(stream); > } > } > > > -- > Mark Bennett / New Idea Engineering, Inc. / [email protected] > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 >
