Re: [Python-Dev] Unicode 5.1.0

Fredrik Lundh Fri, 22 Aug 2008 08:19:58 -0700

On Fri, Aug 22, 2008 at 4:59 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:


>> (how's the 3.2/4.1 dual support implemented?  do we have two distinct
>> datasets, or are the differences encoded in some clever way?  would it
>> make sense to split the unicodedata module into three separate
>> modules, one for each major Unicode version?)
>
> The current API looks fine to me: unicodedata is the latest version
> whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the
> same; there's a tiny bit of code in the generated _db.h file that
> expresses the differences:
>
> static const change_record* get_change_3_2_0(Py_UCS4 n)
> {
>        int index;
>        if (n >= 0x110000) index = 0;
>        else {
>                index = changes_3_2_0_index[n>>7];
>                index = changes_3_2_0_data[(index<<7)+(n & 127)];
>        }
>        return change_records_3_2_0+index;
> }

there's a bunch of data tables as well, but they don't seem to be very
large.  looks like Martin did a thorough job here.

... digging digging digging ...

yes, the generator script produces difference tables between the main
version and a list of older versions.  I'd say it's worth running the
script on the 5.1.0 tables, and if it doesn't choke, compare the
resulting table with the corresponding table for 4.1.0 (a simple loop
fetching the main properties for all code points).  if the differences
look reasonably small, switch 5.1.0 and keep the others.

I can tinker a little with this over the weekend, unless Martin tells
me not to ;-)

</F>
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode 5.1.0

Reply via email to