It's always amazing to me how hitting the "send" button makes the solution obvious...too late <G>.
Been there, done that... more times than I want to count.... Best Erick On Sun, Dec 23, 2012 at 2:30 PM, Jeremy Long <jeremy.l...@gmail.com> wrote: > Have you ever wished you could retract your question to a mailing list? And > for anyone that read my question - yes, I do know the difference between a > bitwise "and" and a bitwise "or" and how they should be used when combining > flags... Sorry for the spam. > > --Jeremy > > On Sun, Dec 23, 2012 at 11:56 AM, Jeremy Long <jeremy.l...@gmail.com> > wrote: > > > Hello, > > > > I'm having an issue creating a custom analyzer utilizing the > > WordDelimiterFilter. I'm attempting to create an index of information > > gleaned from JAR manifest files. So if I have "spring-framework" I need > the > > following tokens indexed: "spring" "springframework" "framework" > > "spring-framework". My understanding is that the WordDelimiterFilter is > > perfect for this. However, when I introduce the filter to the analyzer I > > don't seem to get any documents indexed correctly. > > > > Here is the analyzer: > > > > import java.io.Reader; > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.TokenStream; > > import org.apache.lucene.analysis.Tokenizer; > > import org.apache.lucene.analysis.core.WhitespaceTokenizer; > > import org.apache.lucene.analysis.core.LowerCaseFilter; > > import org.apache.lucene.analysis.core.StopAnalyzer; > > import org.apache.lucene.analysis.core.StopFilter; > > import org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter; > > import org.apache.lucene.util.Version; > > > > public class FieldAnalyzer extends Analyzer { > > > > private Version version = null; > > > > public FieldAnalyzer(Version version) { > > this.version = version; > > } > > > > @Override > > protected TokenStreamComponents createComponents(String fieldName, > > Reader reader) { > > > > Tokenizer source = new WhitespaceTokenizer(version, reader); > > TokenStream stream = source; > > > > stream = new WordDelimiterFilter(stream, > > WordDelimiterFilter.CATENATE_WORDS > > & WordDelimiterFilter.GENERATE_WORD_PARTS > > & WordDelimiterFilter.PRESERVE_ORIGINAL > > & WordDelimiterFilter.SPLIT_ON_CASE_CHANGE > > & WordDelimiterFilter.STEM_ENGLISH_POSSESSIVE, null); > > > > stream = new LowerCaseFilter(version, stream); > > stream = new StopFilter(version, stream, > > StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > > > return new TokenStreamComponents(source, stream); > > } > > } > > > > //------------------------------------------------- > > > > Performing a very simple test results in zero document found: > > > > Analyzer analyzer = new FieldAnalyzer(Version.LUCENE_40); > > Directory index = new RAMDirectory(); > > > > String text = "spring-framework"; > > String field = "field"; > > > > IndexWriterConfig config = new > > IndexWriterConfig(Version.LUCENE_40, analyzer); > > IndexWriter w = new IndexWriter(index, config); > > Document doc = new Document(); > > doc.add(new TextField(field, text, Field.Store.YES)); > > w.addDocument(doc); > > w.close(); > > > > String querystr = "spring-framework"; > > Query q = new AnalyzingQueryParser(Version.LUCENE_40, field, > > analyzer).parse(querystr); > > int hitsPerPage = 10; > > > > IndexReader reader = DirectoryReader.open(index); > > IndexSearcher searcher = new IndexSearcher(reader); > > TopScoreDocCollector collector = > > TopScoreDocCollector.create(hitsPerPage, true); > > searcher.search(q, collector); > > ScoreDoc[] hits = collector.topDocs().scoreDocs; > > > > System.out.println("Found " + hits.length + " hits."); > > for (int i = 0; i < hits.length; ++i) { > > int docId = hits[i].doc; > > Document d = searcher.doc(docId); > > System.out.println((i + 1) + ". " + d.get(field)); > > } > > > > > > Any idea what I've done wrong? If I comment out the addition of > > WordDelimiterFilter - the search works. > > > > Thanks in advance, > > > > Jeremy > > > > >