Re: How to include a multi-word synonym to a word when indexing?

Erik Hatcher Tue, 12 Apr 2005 06:37:21 -0700

On Apr 12, 2005, at 1:42 AM, Chris Hostetter wrote:


: You'll need some kind of lookup to know how to split a token like
: "cybercafe" into two words - once you've done that it will be easy to
: set the position increment of them to zero so that they overlay the
: original term.

but how would you set the position increment of a multi-word synonym so
that phrase/span queries will work?

Assuming you have the following "phrase synonym" (and code that
that can find them during Analysis)...

   [CyberCafe] => [Cyber] [Cafe]
   [IBM] => [International] [Business] [Machines]
   [Cyber] [Cafe] => [CyberCafe]
   [International] [Business] [Machines] => [IBM]

and the source documents:

1) bob bought stock in IBM for five bucks
2) sue went to the cybercafe yesterday
3) joe was at the cafe, cyber chating yesterday

...how would you set the position incriment so that a span/phrase query
for "stock in International Business Machines" would match document #1,
and "cyber cafe" would match document #2 but not #3 ?

On further thought, my approach would be to handle this on the analysis side and not deal with position increments. The lookup would take "cyber cafe" and emit the token "cybercafe". In your #3 example, the tokens would be [cafe] [cyber] and would not match. If someone issued a phrase query for "cyber cafe" the same analysis would turn that into a query for "cybercafe".

What drawbacks are there from replacing multiple words with its corresponding acryonym/alias during analysis?

the only thing that's ever occured to me is to set the position incriment

I can't help myself, I'm working with the spell checker as we speak.... incrEment :)

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to include a multi-word synonym to a word when indexing?

Reply via email to