Eric,
IMHO the number of side-effects can be reduced by requiring "phrases":
tokens for your query, "sponge bob", would look like
"sp po on ng ge" eb "bo ob"
Maxym
==================================
Maxym Mykhalchuk
(+39) 320 8593170
PhD student at University of Trento, ITALY
==================================
----- Original Message -----
From: "Eric Isakson" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, May 11, 2006 3:54 PM
Subject: RE: Searching across spaces
You might consider using overlapping bi-gram tokenization with stripped
out whitespace and a PhraseQuery.
So your tokenized content, "spongebob squarepants", would look like:
sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts
and your tokens for your query, "sponge bob", would look like
sp po on ng ge eb bo ob
Add each token to the PhraseQuery and you should match.
This is very similar to the techniques used for searching in Asian
languages which do not seperate words with spaces. There are probably some
side effects for compound words that you didn't mean to do this too, but
without knowing the exact domain of compound words that you wish to
support, this is probably the best you will be able to do.
-----Original Message-----
From: Robert Young [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 10, 2006 2:09 PM
To: [email protected]
Subject: Searching across spaces
Hi,
How can I search accross spaces in the document when the spaces aren't
present in the search. For example, if the document contains "spongebob
squarepants" but the user searches on "sponge bob" I would like to get the
result.
Thanks
Rob
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]