--- On Thu, 7/24/08, JBTech <[EMAIL PROTECTED]> wrote: > Is there a way to avoid stemming in certain cases?
As a general rule, make the query intelligent and not the index. Therefore, index your text verbatim. Small changes like changing terms to lowercase and removing possessives are fine. You now have an index upon which you can make intelligent queries. An intelligent query requires keeping track of several collections of term-to-term(s) mappings. For example, stemmed-term to verbatim-term(s). Now, convert the users search for "elephant is a big animal" into something akin to ( (elephant^10) OR (A) OR (B) ) AND ( (big^10) OR (C) ) AND ( (animal^10) OR (D) ) Where A and B are other terms with the same stemming as elephant, C is another term with the same stemming as big, and D is a another term with the same stemming as animal. Adding the boost ensures that a verbatim match pushes the document's rank higher and so ensure that what the user asked for is closer to the top. This basic idea of making the queries more intelligent by broadening them and boosting term weights gives you a lot of control over the query and how results are ranked. The same control is not possible by making the index more intelligent. Don't worry about Lucene's performance with complex queries. My experience is that it is very fast. And to answer your specific question, search for "e*t" will work as is. -- Andrew
