Good Afternoon Everyone,
I have several issues I have been trying to solve and have been getting stuck. The two issues I have been trying to solve are using the StandardAnalyzer: - Tokenize strings that the standard grammar is considering serial numbers, e.g. "ABC-2007-5-22" is being stored as "ABC-2006-5-22" instead of "ABC" "2006" "5" "22". - Get the analyzer to recognize mixed case "and"s as "AND" On the first issue of tokenizing strings I have been looking at the StandardAnalyzer.jj file located in the "\Lucene.Net\Analysis\Standard" folder. I see that this file holds the JavaCC grammar the analyzer uses to parse tokens. I am wondering how this file gets compiled into the C# dll. The other issue with this file is how I can use the StandardAnalyzer.jj to solve my first issue. From looking at the file it appears that the "<NUM>" Grammar rule is the rule that defines a serial number as a single token. If I remove this from the array that defines the grammar, will the tokenizer split the strings the way I am looking for? Any other ideas would be greatly appreciated. On the second issue I am trying to avoid string.replace'ing the user input query. Hopefully there is some method in the QueryAnalyzer to enable mixed case "and"s. If this helps I am using Lucene 1.9.1 on Visual Studio 2005 and compiling for the .NET 2.0 Framework. Thanks, Christopher A. David Software Engineer Snapstream Media http://www.snapstream.com <http://www.snapstream.com/> http://www.couchville.com <http://www.couchville.com/>
