I'm cc'ing to X/2 list, as IMHO it may be of some generic interest:
On 22 Jul 1999 14:34:07 +0900, [EMAIL PROTECTED] wrote:
>"Arnd Hanses" <[EMAIL PROTECTED]> wrote:
>
>> -
>> + name.subst('=DF', 'z'); /* AHanses: no more trouble with 'umlauts'? */=
>>
>> + name.subst('=E4', 'a');
>> + name.subst('=F6', 'o');
>> + name.subst('=FC', 'u');
>> + name.subst('=E6', 'e');
>> + name.subst('=F1', 'n');
>> + name.subst('=C4', 'A'); /* AHanses: For OS/2 'lowercase()' won't help=
>> */
>> + name.subst('=D6', 'O');
>> + name.subst('=DC', 'U');
>> + name.subst('=C6', 'A');
>> + name.subst('=D1', 'U');
>> +#warning AHanses: List is incomplete. Please find a more generic soluti=
>> on.
>> +
>
>As I said, don't do this here. It is much more efficient to bitand 0x7f
>to whole the string:
>
>LString& LString::discardSign()
>{
> for (int i=0; i<length() ; i++)
> p->s[i] |= 0x7f;
> return *this;
>}
>
>ISO-8859-x and EUC (Extended Unix Code) works just fine with this.
>And also as I said, this must be called before the special characters
>handling is performed here *BUT* after 0xaf, 0xba and 0xdc are
>converted to something different.
>The only remaining problem is that Microsoft/IBM codepages, Shift-JIS
>and Big5 utilizes the region 0x80-0x9f which will be converted to
>non-printable control characters. E.g.
>
># Format: Three tab-separated columns (Sorry tabs are expanded)
># Column #1 is the cp850_DOSLatin1 code (in hex)
># Column #2 is the Unicode (in hex as 0xXXXX)
># Column #3 is the Unicode name (follows a comment sign, '#')
>0x80 0x00c7 #LATIN CAPITAL LETTER C WITH CEDILLA
>0x81 0x00fc #LATIN SMALL LETTER U WITH DIAERESIS
>0x82 0x00e9 #LATIN SMALL LETTER E WITH ACUTE
>0x83 0x00e2 #LATIN SMALL LETTER A WITH CIRCUMFLEX
>0x84 0x00e4 #LATIN SMALL LETTER A WITH DIAERESIS
>0x85 0x00e0 #LATIN SMALL LETTER A WITH GRAVE
>0x86 0x00e5 #LATIN SMALL LETTER A WITH RING ABOVE
>0x87 0x00e7 #LATIN SMALL LETTER C WITH CEDILLA
>0x88 0x00ea #LATIN SMALL LETTER E WITH CIRCUMFLEX
>0x89 0x00eb #LATIN SMALL LETTER E WITH DIAERESIS
>0x8a 0x00e8 #LATIN SMALL LETTER E WITH GRAVE
>0x8b 0x00ef #LATIN SMALL LETTER I WITH DIAERESIS
>0x8c 0x00ee #LATIN SMALL LETTER I WITH CIRCUMFLEX
>0x8d 0x00ec #LATIN SMALL LETTER I WITH GRAVE
>0x8e 0x00c4 #LATIN CAPITAL LETTER A WITH DIAERESIS
>0x8f 0x00c5 #LATIN CAPITAL LETTER A WITH RING ABOVE
>0x90 0x00c9 #LATIN CAPITAL LETTER E WITH ACUTE
>0x91 0x00e6 #LATIN SMALL LIGATURE AE
>0x92 0x00c6 #LATIN CAPITAL LIGATURE AE
>0x93 0x00f4 #LATIN SMALL LETTER O WITH CIRCUMFLEX
>0x94 0x00f6 #LATIN SMALL LETTER O WITH DIAERESIS
>0x95 0x00f2 #LATIN SMALL LETTER O WITH GRAVE
>0x96 0x00fb #LATIN SMALL LETTER U WITH CIRCUMFLEX
>0x97 0x00f9 #LATIN SMALL LETTER U WITH GRAVE
>0x98 0x00ff #LATIN SMALL LETTER Y WITH DIAERESIS
>0x99 0x00d6 #LATIN CAPITAL LETTER O WITH DIAERESIS
>0x9a 0x00dc #LATIN CAPITAL LETTER U WITH DIAERESIS
>0x9b 0x00f8 #LATIN SMALL LETTER O WITH STROKE
>0x9c 0x00a3 #POUND SIGN
>0x9d 0x00d8 #LATIN CAPITAL LETTER O WITH STROKE
>0x9e 0x00d7 #MULTIPLICATION SIGN
>0x9f 0x0192 #LATIN SMALL LETTER F WITH HOOK
In fact, IMHO this is no OS/2 specific issue. Please note that the
japanese X-True-Type-Server patch to XFree, among many more non latin
encodings, now supports most of the IBM PC and WIN codepages.
It is scheduled to soon become part of stock XFree/X11R6 itself. Many,
if not most of True-Type font users will switch to one of the new
encodings, as the special (extended) fonts for their purposes only use
WINXX encodings. Unfortunately most of the documentation is still only
available in japanese :(
So I'd recommend to think a bit how this might be supported in LyX,
especially how to test for the encoding actually in use and how to
inform the user about possibilities and problems.
Greets,
Arnd