your output says you couldn't find "ugli", but you indexed "ugly". I
assume that's just a typo, and the stemmer probably makes it moot
anyway....
I don't see anything obvious in the code, but here's what I'd suggest...
1> write this out to a FSDir rather than a RAMDir, get a copy of Luke
(google "lucene luke") and examine what's actually in your index.
2> query.toString is your friend to find out how the query is actually
passed to the searcher.
3> You are explicitly doing phrase queries by quoting the string, but
that should be OK.
FWIW
Erick
On Sun, Dec 7, 2008 at 1:26 PM, Preetam Rao <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am indexing three words in a document.
> Then I run a phrase query on that document searching for two words at a
> time
> and three words at a time.
> I use PorterStemFilter for both searching and indexing. I am getting very
> inconsistent results. Am I doing something incorrectly ?
> The way I use PorterStemmer is by overriding tokenStream() method of
> StandardAnalyzer and adding PorterStemFiler to the chain.
> If I use StandardAnalyzer everything works fine. I am suspecting the way I
> am creating the analyzer.
> I printed position increments, offsets etc for both cases and did not see
> any difference.
>
> Below are the tests I am running and the full code.
>
> tests:
> Indexed content : "one two three" search : "one two" no documents found
> Indexed content : "one two three" search : "one two three" no documents
> found
>
> Indexed content : "first second third" search : "first second" one
> documents found
> Indexed content : "first second third" search :"first second third" one
> documents found
>
> Indexed content : "good bad ugly" search : "good bad" one documents found
> Indexed content : "good bad ugly" search :"good bad ugly" no documents
> found
>
>
> The below is the code:
>
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.PorterStemFilter;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.queryParser.ParseException;
> import java.io.Reader;
> import java.io.IOException;
>
>
> public class TestPorterStemmer {
> public static void main(String[] args) throws IOException,
> ParseException {
> RAMDirectory index = new RAMDirectory();
> IndexWriter writer = new IndexWriter(index, getAnalyzer(), true,
> IndexWriter.MaxFieldLength.UNLIMITED);
> Document doc = new Document();
> doc.add(new Field("content", "good bad ugly", Field.Store.YES,
> Field.Index.ANALYZED));
> writer.addDocument(doc);
> writer.optimize();
> writer.close();
> IndexSearcher searcher = new IndexSearcher(index);
> QueryParser parser = new QueryParser("content", getAnalyzer());
>
> Query query = parser.parse("\"" + "good bad" + "\"");
> Hits hits = searcher.search(query);
> System.out.println("searched for " + query.toString() + " matched :
> " + hits.length() + " documents ");
>
> query = parser.parse("\"" + "good bad ugly" + "\"");
> hits = searcher.search(query);
> System.out.println("searched for " + query.toString() + " matched :
> " + hits.length() + " documents ");
> }
>
> public static StandardAnalyzer getAnalyzer() {
> return new StandardAnalyzer() {
> public TokenStream tokenStream(String fieldName, Reader reader)
> {
> TokenStream result = super.tokenStream(fieldName, reader);
> return new PorterStemFilter(result);
> }
> };
> }
> }
>
> output:
> searched for content:"good bad" matched : 1 documents
> searched for content:"good bad ugli" matched : 0 documents
>
> Any help is greatly appreciated...
>
> Thanks
> Preetam
>