Dear all SQLite3 users,

Recently i have been working on a dictionary style project that had to
work with UNICODE non-latin1 strings, i did try the ICU project but i
wasn't satisfied with the extra baggage that came with it.
I would like to recommend the following possible solution to the long
standing UNICODE issue, that was built in as an ICU alternative
(excluding collation's), and could be easily be included in the SQLite
core as default behavior.

http://ioannis.mpsounds.net/blog/?dl=sqlite3_unicode.c

The above file contains mapping tables for lower(), upper(), title(),
fold()* characters based on UNICODE mapping tables as described
currently by the UNICODE standard v5.1.0 beta, that are used by
functions to transform characters to their respective folding cases.
(These tables were built by a modified version of Loic Dachary builder
in order to included required case transformations)
* UNICODE uses case folding mapping tables to implement non-case
sensitive comparison sequences (eg LIKE).

The above file utilizes the existing ICU infrastructure built in
SQLite in order to activate the extra functionality, to automatically
:
- override the LIKE operation, to support full UNICODE non-case
sensitive comparison
- override upper(), lower(), to support case transformation of UNICODE
characters based on UNICODE mapping tables as described currently by
the UNICODE standard v5.1.0 beta
- provide title() and fold() functions, also based on UNICODE mapping
tables as described currently by the UNICODE standard v5.1.0 beta
- provide unaccent() function, (based on the unac library designed for
linux by Loic Dachary) to decompose UNICODE characters to there
unaccented equivalents in order to perform simpler queries and return
wider range of results. (eg. ά -> α, æ -> ae in the latter example the
string will automatically grow by 1 character point)

In comparison to ICU no collation sequences have been implemented yet.
The above functionalities have been designed to be included/excluded
independently according to specific needs in order to minimize the
size of the library.
The total overhead over the SQLite library size with all functionality
enabled is approximately 70~80KB.

The above file has not been thoroughly tested, but i consider the
implementation to stable.
You can leave comments, bug reports, suggestions on this board or at
http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support
(PS. I am not an SQLite expert, but i had to improvise on some extent
on this matter.)

Thank you very much.

Reply via email to