Hi there, I have started using Lucene not long ago, with plans to replace my current sql queries in my application with it. As I wasn't aware of Lucene before, I have implemented some similar tools (filters) as Lucene includes.
For example I have implemented a "stop word" tool. In my case I have much more configuration options than Lucene, having the option to remove sub strings in addition to complete tokens. I can configure the wanted location of the sub string within the token, or even the location of the token within the phrase. I have implemented a synonym mechanism (substitution mechanism) that can also be configured according to location within a phrase. It can also be configured to find synonyms taking into account spelling mistakes. Although it doesn't expand but only transforms to one certain replacement.It can find replacements for sub strings as well. So I can use it to separate words. For example in German I have "strasse"=> " strasse" (with a space in the front), so words like "mainstrasse" will be split to "main" and "strasse". I am wondering if I can use my "standardization" tools before calling the lucene indexing, without implementing any custom analyzers and achieve more or less the same results? What do I "loose" if I go this way? The stemming filters are really one thing I didn't have and I will use. Is there any point for me to start creating custom analyzers with filter for stop words, synonyms, and implementing my own "sub string" filter, for separating tokens into "sub words" (like "mainstrasse"=> "main", "strasse") ? Thanks in advance -- View this message in context: http://www.nabble.com/stop-words%2C-synonyms...-what%27s-in-it-for-me--tf3792510.html#a10725950 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]