On Thu, Jun 20, 2002 at 12:03:12AM -0400, Seer wrote:
> 
> I wrote:
> >Why don't you just use that data of the width specification from the
> >LC_CTYPE of the current locale? That is probably much faster to get
> >access to, it is just one indexing operation instead of a number of if's.
> >And you get free upgrade when the locale is upgraded.
> 
> first off LC_CTYPE is an outdated concept, and should be dispensed with.
> (the width function im thinking of expects utf-8, and the system wcwidth
>  expects ucs-4, locale regardless ) Locales should really be about how
> you want to see dates/times formatted, and which string constants you
> want apps to spit out in error messages and the like. They should desist
> supporting outmoded encodings. (iso-8859-*,euc-*,*jis*,tcvn&friends)

Have you heard of Turkish i, and outdigit and title casing for dutch IJ?
there are some things that are locale dependent for LC_CTYPE.
(admittedly most should be the same). Wcwidth is actually also one of
the culturally dependent attributes of characters. Many eastern asians
expects many latin, greek and cyrillic characters to be fullwidth,
while westeners expect them to be halfwidth.

> secondly, having a 2 megabyte array around just for looking up widths
> can
> be faster, assuming you have a really good virtual memory system, and
> gobs
> of ram. However, i still think a tree is the correct compromise.

surely you can implement a table lookup much faster, and also
less memory-consuming. One strategy is to have arrays of arrays of
arrays of say an octet value of 10646 and another is to have either all
widths on one 256-entry page be the same value, or have the actual 
value as an index. This is not O(log n) at all. It is of cause trading
space for speed but I think you could have a table for wcwidth well
under 64 k just using the simple techniques just metioned.

Kind regards
keld
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to