Re: issues with wildcard search and snowball english analyzer

Andrew Gilmartin Thu, 24 Jul 2008 16:26:28 -0700

--- On Thu, 7/24/08, JBTech <[EMAIL PROTECTED]> wrote:

> Is there a way to avoid stemming in certain cases?


As a general rule, make the query intelligent and not the index. Therefore, 
index your text verbatim. Small changes like changing terms to lowercase and 
removing possessives are fine. You now have an index upon which you can make 
intelligent queries.

An intelligent query requires keeping track of several collections of 
term-to-term(s) mappings. For example, stemmed-term to verbatim-term(s). Now, 
convert the users search for "elephant is a big animal" into something akin to 

( (elephant^10) OR (A) OR (B) ) AND
( (big^10) OR (C) ) AND
( (animal^10) OR (D) )

Where A and B are other terms with the same stemming as elephant, C is another 
term with the same stemming as big, and D is a another term with the same 
stemming as animal. Adding the boost ensures that a verbatim match pushes the 
document's rank higher and so ensure that what the user asked for is closer to 
the top.

This basic idea of making the queries more intelligent by broadening them and 
boosting term weights gives you a lot of control over the query and how results 
are ranked. The same control is not possible by making the index more 
intelligent.

Don't worry about Lucene's performance with complex queries. My experience is 
that it is very fast.

And to answer your specific question, search for "e*t" will work as is.

-- Andrew

Re: issues with wildcard search and snowball english analyzer

Reply via email to