Bug#99933: Comments on Unicode

Raul Miller Fri, 06 Jul 2001 07:53:47 -0500

On Fri, Jul 06, 2001 at 04:36:25AM +0100, David Starner wrote:
> > Do you have any idea whether the problems identified at
> > http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
> > have been resolved?
> 
> Are they a problem for us? Windows Code Page 932 may or may not correspond
> to anything that we care about. (At a glance, at least one of each pair that
> both correspond to the same Unicode character is not in the real JIS X
> 0218.)

If it's indeed the case that this is a CP 932 problem and not a shift JIS
problem, and if it's indeed the case that we don't support CP 932, then
I'll agree that this isn't a problem.

> > Prior to Unicode 3.1 the code space was 16 bits.
>
> NO. Since Unicode 2.0, the code space has been 21 bits. The ONLY thing
> that Unicode 3.1 did, is put characters above U+FFFF. It did not
> change the fundamental structure of Unicode in the least.

I stand corrected.

> > Once unicode can act as a super set for every character set we currently
> > support, we can use it as such.  Until then, we can't.
> 
> If Unicode were a super set for every character set that anyone needs to
> support, it would be worthless and completely unusable.

I didn't say for any character set that anyone needs to support.
I said for every character set we currently support.  I hope you see the
difference.  [And, as an aside, I should have said "for each character
set that we currently support" -- I understand that unicode doesn't need
to support mixed character set usage before we migrate.]

> However, if we currently support any character set well, it is through
> a Unicode based glibc - I don't believe libc accepts the existance of
> any character set that can't be mapped to Unicode. So arguably, yes,
> Unicode is a super set for every character set we currently support
> well.

Assuming we're using glibc support (e.g. toupper()) for all those
character sets, I'll agree that you have a good point.

On 20010705T133736-0400, Raul Miller wrote:
> > in HTML the language can only be identified in the mime header.

On Fri, Jul 06, 2001 at 11:23:42AM +0300, Antti-Juhani Kaijanaho wrote:
> There is no such thing as a MIME header in HTML.
>
> Besides, HTML does include the lang attribute for most elements. I
> wonder what it's for if not for indicating the language.

I stand corrected.

Thanks,

-- 
Raul

Bug#99933: Comments on Unicode

Reply via email to