Search the Internet for an SQLite extension called "unifuzz.c" and see
if that does what you want in the way of character folding.  I have a
copy of the code on my other computer if you cannot find the original
authors original code.

Here is the download link:
https://dl.dropboxusercontent.com/u/26433628/unifuzz.zip

It is a very bad (yes!) collation collection, since none of them match any known locale collation requirement. If you need to collate German only but correctly, look at ICU.

What it offers is a set of locale-independant collations and functions, plus a bit more.

I wrote it after realizing there was no good way to collate and fuzzy search text from various languages intermixed in the same column, for instance (we had customers in 49 countries and suppliers from 10.) Since it's impossible to collate things correctly and simultaneously for several languages, the best was to case fold and/or unaccent data in the least damaging way.

The (largish) code has provision for dealing with the German eszet and several unique characters. I know it's being used in a number of countries, not all latin. It uses custom v5.1 Unicode tries in circa 180kb.

It also offers collation for Unicode digits (not only 0-9) and a good share of other string functions.

It is currently Windows-bound since it uses one Windows function, but I'm sure it can be made to work under Linux as well. Be sure to read the lengthy comment at top of the source before using or deciding it's worthless.

Please drop me a note if you find it useful or discover bugs. Feel free to use and abuse the code, but please don't release a distinct version under the same name.

--
<mailto:j...@q-e-d.org>j...@antichoc.net
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to