Dan Sugalski wrote:
> 
>...
> 
> The only time you need to deal with the actual encoding of a string is when
> doing I/O (generally specified as an attribute on the filehandle) or when
> someone feels the need to hit the bits directly. The latter should be
> rather less common than the former, I hope, but still doable.

Great.

> >I think that the extra complexity of dealing with multiple character
> >sets has more cost than benefit. What will chr(10203) return?
> 
> The default character set's chr(10203). In which case it's no different
> than chr(65), which isn't an A on EBCDIC platforms... :)

Is it really a good idea for the meaning of your Perl program to change
in this way between platforms? In XML we tried hard not to do that. Java
and JavaScript are also good about this. Python does not expose its
default encoding machinery either (or did not last time I checked).

It seems like just one more platform dependency that the programmer must
be careful of.

>...
> We're only going to do variable width for I/O, and only if the source or
> destination are in a variable width format. The internal bits that need to
> care will work on fixed-width representations.

Then you'll pay the memory cost for Unicode up-front. I'd suggest you
take advantage of the simplification you can get from using its
character set also.

In principle I have nothing against a multi-character set system but I
have a sense that the details are going to be extremely hairy and I'm
afraid that maybe those details will bubble up to the programmer and
make the usage model harder than on VMs that standardize (essentially
every other VM in the world!).

 Paul Prescod

Reply via email to