Hi, I am indexing three words in a document. Then I run a phrase query on that document searching for two words at a time and three words at a time. I use PorterStemFilter for both searching and indexing. I am getting very inconsistent results. Am I doing something incorrectly ? The way I use PorterStemmer is by overriding tokenStream() method of StandardAnalyzer and adding PorterStemFiler to the chain. If I use StandardAnalyzer everything works fine. I am suspecting the way I am creating the analyzer. I printed position increments, offsets etc for both cases and did not see any difference.
Below are the tests I am running and the full code. tests: Indexed content : "one two three" search : "one two" no documents found Indexed content : "one two three" search : "one two three" no documents found Indexed content : "first second third" search : "first second" one documents found Indexed content : "first second third" search :"first second third" one documents found Indexed content : "good bad ugly" search : "good bad" one documents found Indexed content : "good bad ugly" search :"good bad ugly" no documents found The below is the code: import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.PorterStemFilter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.Hits; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.queryParser.ParseException; import java.io.Reader; import java.io.IOException; public class TestPorterStemmer { public static void main(String[] args) throws IOException, ParseException { RAMDirectory index = new RAMDirectory(); IndexWriter writer = new IndexWriter(index, getAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.add(new Field("content", "good bad ugly", Field.Store.YES, Field.Index.ANALYZED)); writer.addDocument(doc); writer.optimize(); writer.close(); IndexSearcher searcher = new IndexSearcher(index); QueryParser parser = new QueryParser("content", getAnalyzer()); Query query = parser.parse("\"" + "good bad" + "\""); Hits hits = searcher.search(query); System.out.println("searched for " + query.toString() + " matched : " + hits.length() + " documents "); query = parser.parse("\"" + "good bad ugly" + "\""); hits = searcher.search(query); System.out.println("searched for " + query.toString() + " matched : " + hits.length() + " documents "); } public static StandardAnalyzer getAnalyzer() { return new StandardAnalyzer() { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = super.tokenStream(fieldName, reader); return new PorterStemFilter(result); } }; } } output: searched for content:"good bad" matched : 1 documents searched for content:"good bad ugli" matched : 0 documents Any help is greatly appreciated... Thanks Preetam