Re: Phrase Search

Andrzej Bialecki Mon, 18 Jun 2007 07:24:02 -0700

Erick Erickson wrote:

Phrase queries won't help you here....


Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution....

If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening is that StandardAnalyzer is indexing indivdual
tokens, not phrases. So, doc 1 has the tokens
"hiran", "margi"

Doc 2 has tokens.
"hiran", "magri", "sec", and "10"

and so on...

Searching, even for phrases, on "hiran margi" matches
4 docs because those two tokens appear next to each other.

If, on the other hand, you index your address field UN_TOKENIZED,
then doc1 has a "token" of "hiran margi", while doc 2 has a token
of "hiran magri sec 10". Doc2 won't match a query on
"hiran margi" etc.

But, this may not be a good solution because searching on
"hiran" won't match *any* document. You might have to index
the same fields two different ways to get all the behavior you
want.

Another good old trick is to index field values (tokenized) withappended special starting and ending tokens, e.g. instead of "HiranMagri" use "_start_ Hiran Magri _end_". Then you can query for fieldsthat are exactly equal to a phrase, while still retaining thepossibility to search by individual terms and phrases not equal to thefield value.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Phrase Search

Reply via email to