Re: emx file handling patch

Arnd Hanses Thu, 22 Jul 1999 02:16:11 -0700
I'm cc'ing to X/2 list, as IMHO it may be of some generic interest:

On 22 Jul 1999 14:34:07 +0900, [EMAIL PROTECTED] wrote:

>"Arnd Hanses" <[EMAIL PROTECTED]> wrote:
>
>> -
>> +     name.subst('=DF', 'z'); /* AHanses: no more trouble with 'umlauts'? */=
>>
>> +     name.subst('=E4', 'a');
>> +     name.subst('=F6', 'o');
>> +     name.subst('=FC', 'u');
>> +     name.subst('=E6', 'e');
>> +     name.subst('=F1', 'n');
>> +     name.subst('=C4', 'A'); /* AHanses: For OS/2 'lowercase()' won't help=
>>  */
>> +     name.subst('=D6', 'O');
>> +     name.subst('=DC', 'U');
>> +     name.subst('=C6', 'A');
>> +     name.subst('=D1', 'U');
>> +#warning AHanses: List is incomplete. Please find a more generic soluti=
>> on.
>> +
>
>As I said, don't do this here.  It is much more efficient to bitand 0x7f
>to whole the string:
>
>LString& LString::discardSign()
>{
>        for (int i=0; i<length() ; i++)
>                p->s[i] |= 0x7f;
>        return *this;
>}
>
>ISO-8859-x and EUC (Extended Unix Code) works just fine with this.
>And also as I said, this must be called before the special characters
>handling is performed here *BUT* after 0xaf, 0xba and 0xdc are
>converted to something different.
>The only remaining problem is that Microsoft/IBM codepages, Shift-JIS
>and Big5 utilizes the region 0x80-0x9f which will be converted to
>non-printable control characters.  E.g.
>
>#    Format: Three tab-separated columns (Sorry tabs are expanded)
>#        Column #1 is the cp850_DOSLatin1 code (in hex)
>#        Column #2 is the Unicode (in hex as 0xXXXX)
>#        Column #3 is the Unicode name (follows a comment sign, '#')
>0x80  0x00c7   #LATIN CAPITAL LETTER C WITH CEDILLA
>0x81  0x00fc   #LATIN SMALL LETTER U WITH DIAERESIS
>0x82  0x00e9   #LATIN SMALL LETTER E WITH ACUTE
>0x83  0x00e2   #LATIN SMALL LETTER A WITH CIRCUMFLEX
>0x84  0x00e4   #LATIN SMALL LETTER A WITH DIAERESIS
>0x85  0x00e0   #LATIN SMALL LETTER A WITH GRAVE
>0x86  0x00e5   #LATIN SMALL LETTER A WITH RING ABOVE
>0x87  0x00e7   #LATIN SMALL LETTER C WITH CEDILLA
>0x88  0x00ea   #LATIN SMALL LETTER E WITH CIRCUMFLEX
>0x89  0x00eb   #LATIN SMALL LETTER E WITH DIAERESIS
>0x8a  0x00e8   #LATIN SMALL LETTER E WITH GRAVE
>0x8b  0x00ef   #LATIN SMALL LETTER I WITH DIAERESIS
>0x8c  0x00ee   #LATIN SMALL LETTER I WITH CIRCUMFLEX
>0x8d  0x00ec   #LATIN SMALL LETTER I WITH GRAVE
>0x8e  0x00c4   #LATIN CAPITAL LETTER A WITH DIAERESIS
>0x8f  0x00c5   #LATIN CAPITAL LETTER A WITH RING ABOVE
>0x90  0x00c9   #LATIN CAPITAL LETTER E WITH ACUTE
>0x91  0x00e6   #LATIN SMALL LIGATURE AE
>0x92  0x00c6   #LATIN CAPITAL LIGATURE AE
>0x93  0x00f4   #LATIN SMALL LETTER O WITH CIRCUMFLEX
>0x94  0x00f6   #LATIN SMALL LETTER O WITH DIAERESIS
>0x95  0x00f2   #LATIN SMALL LETTER O WITH GRAVE
>0x96  0x00fb   #LATIN SMALL LETTER U WITH CIRCUMFLEX
>0x97  0x00f9   #LATIN SMALL LETTER U WITH GRAVE
>0x98  0x00ff   #LATIN SMALL LETTER Y WITH DIAERESIS
>0x99  0x00d6   #LATIN CAPITAL LETTER O WITH DIAERESIS
>0x9a  0x00dc   #LATIN CAPITAL LETTER U WITH DIAERESIS
>0x9b  0x00f8   #LATIN SMALL LETTER O WITH STROKE
>0x9c  0x00a3   #POUND SIGN
>0x9d  0x00d8   #LATIN CAPITAL LETTER O WITH STROKE
>0x9e  0x00d7   #MULTIPLICATION SIGN
>0x9f  0x0192   #LATIN SMALL LETTER F WITH HOOK

In fact, IMHO this is no OS/2 specific issue. Please note that the
japanese X-True-Type-Server patch to XFree, among many more non latin
encodings, now supports most of the IBM PC and WIN codepages. 

It is scheduled to soon become part of stock XFree/X11R6 itself. Many,
if not most of True-Type font users will switch to one of the new
encodings, as the special (extended) fonts for their purposes only use
WINXX encodings. Unfortunately most of the documentation is still only
available in japanese :(

So I'd recommend to think a bit how this might be supported in LyX,
especially how to test for the encoding actually in use and how to
inform the user about possibilities and problems.

Greets,

        Arnd
Re: emx file handling patch

Reply via email to