Re: WordDelimiterFilter Question (lucene 4.0)

Erick Erickson Wed, 26 Dec 2012 06:15:38 -0800

It's always amazing to me how hitting the "send" button makes the solution
obvious...too late <G>.


Been there, done that... more times than I want to count....

Best
Erick


On Sun, Dec 23, 2012 at 2:30 PM, Jeremy Long <[email protected]> wrote:

> Have you ever wished you could retract your question to a mailing list? And
> for anyone that read my question - yes, I do know the difference between a
> bitwise "and" and a bitwise "or" and how they should be used when combining
> flags... Sorry for the spam.
>
> --Jeremy
>
> On Sun, Dec 23, 2012 at 11:56 AM, Jeremy Long <[email protected]>
> wrote:
>
> > Hello,
> >
> > I'm having an issue creating a custom analyzer utilizing the
> > WordDelimiterFilter. I'm attempting to create an index of information
> > gleaned from JAR manifest files. So if I have "spring-framework" I need
> the
> > following tokens indexed: "spring" "springframework" "framework"
> > "spring-framework". My understanding is that the WordDelimiterFilter is
> > perfect for this. However, when I introduce the filter to the analyzer I
> > don't seem to get any documents indexed correctly.
> >
> > Here is the analyzer:
> >
> > import java.io.Reader;
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.analysis.TokenStream;
> > import org.apache.lucene.analysis.Tokenizer;
> > import org.apache.lucene.analysis.core.WhitespaceTokenizer;
> > import org.apache.lucene.analysis.core.LowerCaseFilter;
> > import org.apache.lucene.analysis.core.StopAnalyzer;
> > import org.apache.lucene.analysis.core.StopFilter;
> > import org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter;
> > import org.apache.lucene.util.Version;
> >
> > public class FieldAnalyzer extends Analyzer {
> >
> >     private Version version = null;
> >
> >     public FieldAnalyzer(Version version) {
> >         this.version = version;
> >     }
> >
> >     @Override
> >     protected TokenStreamComponents createComponents(String fieldName,
> > Reader reader) {
> >
> >         Tokenizer source = new WhitespaceTokenizer(version, reader);
> >         TokenStream stream = source;
> >
> >         stream = new WordDelimiterFilter(stream,
> >                 WordDelimiterFilter.CATENATE_WORDS
> >                 & WordDelimiterFilter.GENERATE_WORD_PARTS
> >                 & WordDelimiterFilter.PRESERVE_ORIGINAL
> >                 & WordDelimiterFilter.SPLIT_ON_CASE_CHANGE
> >                 & WordDelimiterFilter.STEM_ENGLISH_POSSESSIVE, null);
> >
> >         stream = new LowerCaseFilter(version, stream);
> >         stream = new StopFilter(version, stream,
> > StopAnalyzer.ENGLISH_STOP_WORDS_SET);
> >
> >         return new TokenStreamComponents(source, stream);
> >     }
> > }
> >
> > //-------------------------------------------------
> >
> > Performing a very simple test results in zero document found:
> >
> >         Analyzer analyzer = new FieldAnalyzer(Version.LUCENE_40);
> >         Directory index = new RAMDirectory();
> >
> >         String text = "spring-framework";
> >         String field = "field";
> >
> >         IndexWriterConfig config = new
> > IndexWriterConfig(Version.LUCENE_40, analyzer);
> >         IndexWriter w = new IndexWriter(index, config);
> >         Document doc = new Document();
> >         doc.add(new TextField(field, text, Field.Store.YES));
> >         w.addDocument(doc);
> >         w.close();
> >
> >         String querystr = "spring-framework";
> >         Query q = new AnalyzingQueryParser(Version.LUCENE_40, field,
> > analyzer).parse(querystr);
> >         int hitsPerPage = 10;
> >
> >         IndexReader reader = DirectoryReader.open(index);
> >         IndexSearcher searcher = new IndexSearcher(reader);
> >         TopScoreDocCollector collector =
> > TopScoreDocCollector.create(hitsPerPage, true);
> >         searcher.search(q, collector);
> >         ScoreDoc[] hits = collector.topDocs().scoreDocs;
> >
> >         System.out.println("Found " + hits.length + " hits.");
> >         for (int i = 0; i < hits.length; ++i) {
> >             int docId = hits[i].doc;
> >             Document d = searcher.doc(docId);
> >             System.out.println((i + 1) + ". " + d.get(field));
> >         }
> >
> >
> > Any idea what I've done wrong? If I comment out the addition of
> > WordDelimiterFilter - the search works.
> >
> > Thanks in advance,
> >
> > Jeremy
> >
> >
>

Re: WordDelimiterFilter Question (lucene 4.0)

Reply via email to