Use of QueryParser to construct the query causes this, with the word breaking 
specifics being determined by the analyzer that you selected.

To avoid word breaking and symbol replacement, you could use a different 
analyzer; but it would be best to construct the query directly using the 
BooleanQuery, TermQuery and related classes.  The latter is preferred because 
some symbols (for example "+", "-") are an essential part of the query syntax 
that QueryParser recognizes.  For example when run through QueryParser the 
search [ +red +blue -green ] is identical to the search [ red AND blue NOT 
green ]

To directly construct a search that does not strip out the "+" symbol you could 
do something like this to search for the string "red+green" in a given field:
Query query = new TermQuery(new Term(searchField,"red+green"));

The [ red AND blue NOT green ] search from above would be constructed like this:

BooleanQuery query = new BooleanQuery();
query.Add(new TermQuery(new Term(searchField,"red")), BooleanClause.Occur.MUST);
query.Add(new TermQuery(new Term(searchField,"blue")), 
BooleanClause.Occur.MUST);
query.Add(new TermQuery(new Term(searchField,"green")), 
BooleanClause.Occur.MUST_NOT);

One other consideration.  The analyzer used to add documents to the Lucene 
index will also determines how the original content is broken into searchable 
terms.  If I recall correctly, the StandardAnalyzer will keep the special 
symbols that comprise a phone number together as a searchable unit; this may 
not be true for other analyzers.

There is a very useful tool called Luke that can be used to inspect an index 
and run trial searches using different analyzers.

Hope this helps.

-- Neal

-----Original Message-----
From: Li Bing [mailto:[email protected]]
Sent: Thursday, August 20, 2009 12:33 AM
To: [email protected]
Subject: Lucene Query Questions

Dear all,

I am using the following code to search indexed data. However, when
the searchKeyword contains some special characters, such as "//", ":",
"+", "-", ".", and even digital numbers, the query removes some
required characters or splits the keyword. Sometimes, it causes no
results although I am sure the results exist. May I cancel the feature
so that the query does not change my original searchKeyword?

        ......
        IndexSearcher searcher = new IndexSearcher(fsDirectory);
        Analyzer chineseAnalyzer = new ChineseAnalyzer();
        QueryParser queryParser = new QueryParser(searchField, chineseAnalyzer);
        Query query = 
queryParser.Parse(DBTools.FilterKeyFieldValue(searchKeyword));
        Hits results = searcher.Search(query);
        ......

Thanks so much!
LB

Reply via email to