Lower Case ANDs and Serial Numbers

Chris David Mon, 11 Jun 2007 14:36:13 -0700

Good Afternoon Everyone,


I have several issues I have been trying to solve and have been getting
stuck.

The two issues I have been trying to solve are using the
StandardAnalyzer:

 - Tokenize strings that the standard grammar is considering serial
numbers, e.g.  "ABC-2007-5-22" is being stored as "ABC-2006-5-22"
instead of "ABC" "2006" "5" "22".

 - Get the analyzer to recognize mixed case "and"s as "AND"

 

On the first issue of tokenizing strings I have been looking at the
StandardAnalyzer.jj file located in the "\Lucene.Net\Analysis\Standard"
folder.  I see that this file holds the JavaCC grammar the analyzer uses
to parse tokens.  I am wondering how this file gets compiled into the C#
dll.  

 

The other issue with this file is how I can use the StandardAnalyzer.jj
to solve my first issue. From looking at the file it appears that the
"<NUM>" Grammar rule is the rule that defines a serial number as a
single token.  If I remove this from the array that defines the grammar,
will the tokenizer split the strings the way I am looking for?  Any
other ideas would be greatly appreciated.

 

On the second issue I am trying to avoid string.replace'ing the user
input query.  Hopefully there is some method in the QueryAnalyzer to
enable mixed case "and"s.

 

If this helps I am using Lucene 1.9.1 on Visual Studio 2005 and
compiling for the .NET 2.0 Framework.

 

Thanks,

 

Christopher A. David

Software Engineer

Snapstream Media 

http://www.snapstream.com <http://www.snapstream.com/> 

http://www.couchville.com <http://www.couchville.com/>

Lower Case ANDs and Serial Numbers

Reply via email to