Erick Erickson wrote:
Phrase queries won't help you here....
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution....
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening is that StandardAnalyzer is indexing indivdual
tokens, not phrases. So, doc 1 has the tokens
"hiran", "margi"
Doc 2 has tokens.
"hiran", "magri", "sec", and "10"
and so on...
Searching, even for phrases, on "hiran margi" matches
4 docs because those two tokens appear next to each other.
If, on the other hand, you index your address field UN_TOKENIZED,
then doc1 has a "token" of "hiran margi", while doc 2 has a token
of "hiran magri sec 10". Doc2 won't match a query on
"hiran margi" etc.
But, this may not be a good solution because searching on
"hiran" won't match *any* document. You might have to index
the same fields two different ways to get all the behavior you
want.
Another good old trick is to index field values (tokenized) with
appended special starting and ending tokens, e.g. instead of "Hiran
Magri" use "_start_ Hiran Magri _end_". Then you can query for fields
that are exactly equal to a phrase, while still retaining the
possibility to search by individual terms and phrases not equal to the
field value.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]