: You'll need some kind of lookup to know how to split a token like : "cybercafe" into two words - once you've done that it will be easy to : set the position increment of them to zero so that they overlay the : original term.
but how would you set the position increment of a multi-word synonym so that phrase/span queries will work?
Assuming you have the following "phrase synonym" (and code that that can find them during Analysis)...
[CyberCafe] => [Cyber] [Cafe] [IBM] => [International] [Business] [Machines] [Cyber] [Cafe] => [CyberCafe] [International] [Business] [Machines] => [IBM]
and the source documents:
1) bob bought stock in IBM for five bucks 2) sue went to the cybercafe yesterday 3) joe was at the cafe, cyber chating yesterday
...how would you set the position incriment so that a span/phrase query for "stock in International Business Machines" would match document #1, and "cyber cafe" would match document #2 but not #3 ?
On further thought, my approach would be to handle this on the analysis side and not deal with position increments. The lookup would take "cyber cafe" and emit the token "cybercafe". In your #3 example, the tokens would be [cafe] [cyber] and would not match. If someone issued a phrase query for "cyber cafe" the same analysis would turn that into a query for "cybercafe".
What drawbacks are there from replacing multiple words with its corresponding acryonym/alias during analysis?
the only thing that's ever occured to me is to set the position incriment
I can't help myself, I'm working with the spell checker as we speak.... incrEment :)
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]