Michiel Meeuwissen wrote:
Henk Hangyi <[EMAIL PROTECTED]> wrote:
When i search on field in MMBase, i would like to take the special
characters into account. For instance when i search on "Yes" i would
also like to find "Y?s", "Y?s", "Y?s" and "Y?s".
I think the actual solution might lay in the realm of full-text-search
engines (I suppose their indices ignore accents, in a similar fashion),
which are available for mysql and postgresql but for which as yet all
support is lacking in MMBase.
The problem is clear, however solutions probably need quite a bit
support from the database. As not all databases store the characters
in a similar manner, for instance your solution with the double __ means
that it only works correctly on databases which store them as double
character, but might not work correctly on databases which use other
encoding methods. The % solution returns too many results and might
overload the database.
The solution, as Michiel says, probably lies in the relm of full-text
search engines. Like Verity, Excalibur, Lucene (the last one is open
source btw). I know that Excalibur has a concept called overlap-iso
which maps all accented characters to their non-accented couter-part.
All searches take place on the non-accented indexes it creates in
that situation. I suspect a solution like that is better for the long
run. However it means quite a bit of work to solve, especially if there
is no support in the database for this stuff. However it was taken into
account in the query-project to support external indexes. I don't know
how much of that support has been fleshed out.
--
Rico Jansen ([EMAIL PROTECTED])
"You call it untidy, I call it LRU ordered" -- Daniel Barlow