Have you ever wished you could retract your question to a mailing list? And for anyone that read my question - yes, I do know the difference between a bitwise "and" and a bitwise "or" and how they should be used when combining flags... Sorry for the spam.
--Jeremy On Sun, Dec 23, 2012 at 11:56 AM, Jeremy Long <jeremy.l...@gmail.com> wrote: > Hello, > > I'm having an issue creating a custom analyzer utilizing the > WordDelimiterFilter. I'm attempting to create an index of information > gleaned from JAR manifest files. So if I have "spring-framework" I need the > following tokens indexed: "spring" "springframework" "framework" > "spring-framework". My understanding is that the WordDelimiterFilter is > perfect for this. However, when I introduce the filter to the analyzer I > don't seem to get any documents indexed correctly. > > Here is the analyzer: > > import java.io.Reader; > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.Tokenizer; > import org.apache.lucene.analysis.core.WhitespaceTokenizer; > import org.apache.lucene.analysis.core.LowerCaseFilter; > import org.apache.lucene.analysis.core.StopAnalyzer; > import org.apache.lucene.analysis.core.StopFilter; > import org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter; > import org.apache.lucene.util.Version; > > public class FieldAnalyzer extends Analyzer { > > private Version version = null; > > public FieldAnalyzer(Version version) { > this.version = version; > } > > @Override > protected TokenStreamComponents createComponents(String fieldName, > Reader reader) { > > Tokenizer source = new WhitespaceTokenizer(version, reader); > TokenStream stream = source; > > stream = new WordDelimiterFilter(stream, > WordDelimiterFilter.CATENATE_WORDS > & WordDelimiterFilter.GENERATE_WORD_PARTS > & WordDelimiterFilter.PRESERVE_ORIGINAL > & WordDelimiterFilter.SPLIT_ON_CASE_CHANGE > & WordDelimiterFilter.STEM_ENGLISH_POSSESSIVE, null); > > stream = new LowerCaseFilter(version, stream); > stream = new StopFilter(version, stream, > StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > return new TokenStreamComponents(source, stream); > } > } > > //------------------------------------------------- > > Performing a very simple test results in zero document found: > > Analyzer analyzer = new FieldAnalyzer(Version.LUCENE_40); > Directory index = new RAMDirectory(); > > String text = "spring-framework"; > String field = "field"; > > IndexWriterConfig config = new > IndexWriterConfig(Version.LUCENE_40, analyzer); > IndexWriter w = new IndexWriter(index, config); > Document doc = new Document(); > doc.add(new TextField(field, text, Field.Store.YES)); > w.addDocument(doc); > w.close(); > > String querystr = "spring-framework"; > Query q = new AnalyzingQueryParser(Version.LUCENE_40, field, > analyzer).parse(querystr); > int hitsPerPage = 10; > > IndexReader reader = DirectoryReader.open(index); > IndexSearcher searcher = new IndexSearcher(reader); > TopScoreDocCollector collector = > TopScoreDocCollector.create(hitsPerPage, true); > searcher.search(q, collector); > ScoreDoc[] hits = collector.topDocs().scoreDocs; > > System.out.println("Found " + hits.length + " hits."); > for (int i = 0; i < hits.length; ++i) { > int docId = hits[i].doc; > Document d = searcher.doc(docId); > System.out.println((i + 1) + ". " + d.get(field)); > } > > > Any idea what I've done wrong? If I comment out the addition of > WordDelimiterFilter - the search works. > > Thanks in advance, > > Jeremy > >