Excerpts from John Bachir's message of Wed Jan 17 16:14:47 -0800 2007:
> Unfortunately, one of the things that the client has asked for is
> 
>    one two three
> 
> to be transformed to
> 
>    *one* *two* *three*

Ok. Then I don't think you really need to worry about escaping anything.
You can split on whitespace, and wrap each token in a WildcardQuery,
prefixed and suffixed with a star. Unless you're supporting phrase
queries surrounded by quotes, in which case "split on whitespace"
becomes something more complicated. Or unless you want to disallow
wildcards from the user, in which case you'll need to escape * and ?.

> And also to be able to transparently search FOR the special characters
> themselves. Which means I will actually not be filtering, but escaping
> the special characters. (I'm assuming Ferret has some facility for
> searching for special characters, although I admit I haven't looked
> into it much yet).

Yep, as long as your tokenizer doesn't discard them, you're fine.

Basically if you're avoiding QueryParser and building Query objects
directly from the strings, then none of these characters have special
semantics (except for * and ? with WildcardQuery).

-- 
William <[EMAIL PROTECTED]>
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to