Searching with Special Characters

Trevor Watson Fri, 28 Aug 2009 08:19:35 -0700

Hello folks,

We are currently attempting to use Lucene.Net to do some searching of aLucene index built off of a MySQL database. The index is built andsearching on it is going quite well. However, we are attempting tosearch for characters that Lucene trims out automatically.

For example, "asdf23(4)" becomes two separate terms "asdf23" and "4".When searching for "asdf23\(4\)" (slashes included to allow the bracketsto remain in the search query), we receive no results. This is becausewhen adding it to the index, it strips out the brackets and divides theminto individual terms.

Is there a way to stop Lucene from splitting that into individual terms?

The code we use to add documents is as follows:
[start code]

string[] sReplace = new string[] {"\\", "+", "-", "&&", "||", "!", "(",")", "{", "}", "[", "]", "^", "\"", "~", "*", "?", ":"};

foreach (string sReplaceTerm in sReplace)
   sInsert = sInsert.Replace(sReplaceTerm, "\\" + sReplaceTerm);

doc.Add(new Lucene.Net.Documents.Field(dr["FieldName"].ToString(),sInsert, Lucene.Net.Documents.Field.Store.YES,Lucene.Net.Documents.Field.Index.TOKENIZED,Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS));

[end code]

Thanks in advance,

Trevor Watson

Searching with Special Characters

Reply via email to