Hello!
kent sin wrote:
> Dear Mr. Barkov,
>
> I have read the unicode.c and would like to make the
> following changes, before that, I would like to hear
> your advise:
>
> 1. separate unicode.c into:
>
> unicode-convert.h unicode-convert.c
> unicode.h
> unicode.c
>
> which unicode-convert.h contains all code convert
> tables. and unicode-convert.c contain codes for
> conversion. unicode.h contain those tables and
> unicode.c contain the remaining codes.
It is already done in latest CVS sources. I devided
unicode related things into 3 files:
unicode.c - routines upon unicode strings
(like strdup,strcat but for unicode strings)
uniconv.c - character set convertion routines.
unidata.c - toupper/tolower/ctype unicode routines.
> I would like to write some scripts to generated the
> two .h files from data found from unicode.org. That
> way, when the new unicode release, we need only to
> generate the new .h and recompile.
Do you really think, we wrote all tables manually? :-)
Of course, we have such scripts. I wrote 3 scripts to generate
sources for:
1. convertion routine for 8bit charset,
using files like for example this one:
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-5.TXT
2. convertion routine for multibyte charset, using file like this:
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
3. tolower/toupper/ctype routines using this file
ftp://ftp.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt
Those scripts are available in separate CVS repository "unicode".
So you can obtain them if you want. Take a look into README.CVS
and just use "unicde" instead of "mnogosearch" as project name.
> 2. add a tonormalize code which is similar to tolower,
> but it also map the diacreted characters to lower non
> diacreted equivalents. For example it will change all
> � � C c to c.
Yes, we want to add such feature soon. I can't guarantee
this, but I hope 3.2.4 will already have this, if we have
enough time.
> 3. I have started to construct a variant equivalent
> table for Chinese characters. But If I put that into
> the above tonormalize there will be a very big table.
> I have think of doing the mapping when the input code
> is converted into unicode (instead of convert them to
> different variant equivalent form, convert them to the
> a chosen variant form. In that way, we need only to
> modify the big5, gb, jis to unicode table. But I am
> not very sure is this hack is good or bad.
I think this table should be done not in big5 or gb
form, but in unicode format. Like toupper/tolower.
> 4. As mnogosearch is a open source project, I have a
> little difficult to contribute the code directly : I
> can not get the premission from my boss even I write
> the code at my own time. So, Before sent you the
> patch, I would like to hear from you.
Can you hear me? :-)
By the way, just interesting...
Why your boss doesn't allow to contribute into open
source project?
--
bar
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]