Re: Embedded two-byte representations of marked alphabetic characters in SBCSs

Paul Gilmartin Sat, 05 Oct 2013 18:05:00 -0700

On Sat, 5 Oct 2013 15:12:09 -0400, John Gilmore wrote:
> . . .
>The difficulty arises when a convention for representing '�'  as two
>successive byte values of the form
>
><-minuscule-e code point><accent-aigu code point>
>
>in one code page collides with the single-byte representation of '�'
>and '�' as just these two unique code points in another code page.
> 
As I see it, the difficulty arises from a misguided attempt to interpred
UTF-8 as a SBCS page; probably aggravated by lack of proper document
header.


>Regrettably, Unicode has carried alternative support for the generic
>
><basic alphabetic code point><modifier code point>
> 
My regret is not profound.

>scheme forward; and its availability and heavy use in some contexts
>needs to figure in the sorts of discussions that have been going on
>here during the last few days.   It greatly complicates translation in
>a fashion that is of no conceptual interest but is messy.
> 
But this deviates from the typewriter convention of striking the
(nonescaping) modifier key before the basic alphabetic key.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Embedded two-byte representations of marked alphabetic characters in SBCSs

Reply via email to