Does a token of "united states" exist in index if using standard analyzer. My understanding is, united and states are separately stored in index, but not as "united states". So, if I build a query like Query q = qp.parse("\"united states\""); It would not return any result. Am I right?

Ian

--------------------------------------------------
From: "Samarendra Pratap" <samarz...@gmail.com>
Sent: Friday, April 16, 2010 9:02 PM
To: <java-user@lucene.apache.org>
Subject: Re: about analyzer for searching location

Hi. I don't think you need a different analyzer. Read about
PhraseQuery<http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/PhraseQuery.html>.
If you are using parse() method of QueryParser. Enclose the searched string
in extra double quotes, which must obviously be escaped.

Query q = qp.parse("\"united states\"");


2010/4/15 Ian.huang <yiwong2...@hotmail.com>

Hi All,

I am implementing a search function for address by hibernate search which
is based on lucene. The class definition as following:

@Indexed
public class Address implements Cloneable
{
@DocumentId
private int id;
@Field
private String addrCountry;
private String addrDesc;
@Field
private String addrLineOne;
private String addrLineTwo;
@Field
private String addrCity;
......

As you see, addrCountry, addrLineone and addrCity are fields for search. I
am using default analyzer in index & search. So I think country name like
United States would be indexed as two terms United, and states.

In addition, during search, a search keyword like United states, or Salt
lake city would be tokenized as two or three single words.

As result, any address fields contain united, city would be returned. like
United Kingdom, but actually I want to get a result of united states.

My expected result as following:

if someone searches for "united" it should return "united states" and
"united kingdom".

if someone searches for "united states" it should return "united states",
and not "united kingdom".

I hope the analyzer can generate term with multiple words. say, united
states to united states. I think standardanalyzer would analyze united
states to united and states?

A different example: if search keyword is parking lot in Salt Lake City,
the generated terms to search need to be: parking lot and Salt Lake City,
not parking,lot,salt,lake and city.

I wonder if any analyzer can help me to implement my requirement. It would
be better to use dictionary based solution, then I can manage some search
terms that could have multiple words.

thanks

Ian




--
Regards,
Samar


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to