Re: Wide strings

Andy Wingo Wed, 28 Jan 2009 10:38:32 -0800

Hi,

On Wed 28 Jan 2009 17:44, Mike Gran <spk...@yahoo.com> writes:


> Since I need this functionality taken care of, and since I have some
> time to play with it, what's the procedure here?

The best thing IMO would be to hack on it on a Git branch, with small
and correct patches. We could get you commit access if you don't already
have it (Ludo or Neil would have to reply on that). Then you could push
your work directly to a branch, so we all can review it easily.

> Do we need to talk more about what needs to be accomplished? Do we
> need a complete specification? Do we need a vote on if it is a good
> idea?

I think you're going in the right direction. More importantly, although
I can't speak for them, Neil and Ludo seem to think so too.

> 1.  Convert the internal char and string representation to be 
> explicitly ISO 8859-1.  Add the to/from locale conversion functionality
> while still retaining 8-bit strings.  Replace C library funcs with 
> Gnulib string funcs where appropriate.

Sounds appropriate to me. I am unfamiliar with the gnulib code; where do
the unicode codepoit tables live? How does one update them? Do we get
full introspection on characters and their classes, properties, etc?

> 2.  Convert the internal representation of chars to 4-byte 
> codepoints, while still retaining 8-bit strings.

Currently, characters are immediate values, with an 8-bit tag. See
tags.h:333. So it seems we have 24 bits remaining, and unicode claims
that 21 bits are the minimum necessary -- so we're good, if you can
figure out a reasonable way to go from a 32-bit codepoint to a 24-bit
codepoint.

> 3.  Convert strings to be a union of 1 byte and 4 byte chars.

There's room on stringbufs to have a flag, I think. Dunno if that's the
right way to do it. Converting the symbols and keywords code to do the
right thing will be a little bit of work, too.

Happy hacking,

Andy
-- 
http://wingolog.org/

Re: Wide strings

Reply via email to