Re: Chinese Support

Alexander Barkov Thu, 21 Feb 2002 03:17:28 -0800

Hello!

kent sin wrote:

> Dear Mr. Barkov,
> 
> I have read the unicode.c and would like to make the
> following changes, before that, I would like to hear
> your advise:
> 
> 1. separate unicode.c into:
> 
>    unicode-convert.h  unicode-convert.c
>    unicode.h
>    unicode.c
>
>    which unicode-convert.h contains all code convert
> tables. and unicode-convert.c contain codes for
> conversion. unicode.h contain those tables and
> unicode.c contain the remaining codes.

It is already done in latest CVS sources. I devided
unicode related things into 3 files:

unicode.c - routines upon unicode strings
             (like strdup,strcat but for unicode strings)

uniconv.c - character set convertion routines.

unidata.c - toupper/tolower/ctype unicode routines.

>    I would like to write some scripts to generated the
> two .h files from data found from unicode.org. That
> way, when the new unicode release, we need only to
> generate the new .h and recompile.

Do you really think, we wrote all tables manually?  :-)
Of course, we have such scripts. I wrote 3 scripts to generate
sources for:

1. convertion routine for 8bit charset,
    using files like for example this one:
     ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-5.TXT

2. convertion routine for multibyte charset, using file like this:

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

3. tolower/toupper/ctype routines using this file
ftp://ftp.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt

Those scripts are available in separate CVS repository "unicode".
So you can obtain them if you want. Take a look into README.CVS
and just use "unicde" instead of "mnogosearch" as project name.

> 2. add a tonormalize code which is similar to tolower,
> but it also map the diacreted characters to lower non
> diacreted equivalents. For example it will change all
> � � C c to c.

Yes, we want to add such feature soon. I can't guarantee
this, but I hope 3.2.4 will already have this, if we have
enough time.

> 3. I have started to construct a variant equivalent
> table for Chinese characters. But If I put that into
> the above tonormalize there will be a very big table.
> I have think of doing the mapping when the input code
> is converted into unicode (instead of convert them to
> different variant equivalent form, convert them to the
> a chosen variant form. In that way, we need only to
> modify the big5, gb, jis to unicode table. But I am
> not very sure is this hack is good or bad.

I think this table should be done not in big5 or gb
form, but in unicode format. Like toupper/tolower.

> 4. As mnogosearch is a open source project, I have a
> little difficult to contribute the code directly : I
> can not get the premission from my boss even I write
> the code at my own time. So, Before sent you the
> patch, I would like to hear from you.

Can you hear me?    :-)

By the way, just interesting...

Why your boss doesn't allow to contribute into open
source project?

-- 

   bar

___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Re: Chinese Support

Reply via email to