Re: RE: [fd-dev] Codepage IDs

Aitor Santamaria Merino Fri, 22 Nov 2002 07:14:28 -0800

Hi,

=====
Actually, Michal and I are working on creating new .CPI files
from scratch (to be used under *any* system supporting .CPI files,
including DR-DOS, PTS-DOS, MS-DOS OEM issues, Arabic/Hebrew issues
of MS-DOS, OS/2 and Windows NT/2000/XP), so you can include and
exclude codepages as you like. Switching between the various .CPI
file formats will also be just a matter of setting a different
conditional define.
=====


This is good, one of my goals is to introduce soon CPI parsing routines
in DISPLAY, so that such project can also be used for FreeDOS with
DISPLAY. But in order to have SOMETHING, I decided to release DISPLAY
with the easier approach of the RAW files (simply a concatenation of the
x8,x14 and x16 fonts).

======
> There's something that I would need to know for KEYB to handle
> this easily: which is the highest codepage number known?

May I refer you to the huge KBD.LST file containing all the keyboard
related news for the forthcoming issue of RBIL? I already sent you
a copy... For your convenience, here's one of the tables to be found
under INT 21h/AX=AD80h:
======

Sorry, I always forgot that there's codepage info over there too! :-)

=======
> ( my wish: below 4000
> my second wish: below 8000
> my last wish: below 16000 :-((()

Country codes, Code Page IDs, and Keyboard Layout IDs are 16-bit values
and should be treated as such. Although far not all of them were or are
used by Microsoft and IBM, the highest assignable Code Page number is
65533. 0, 65534 and 65535 are reserved as they have special meanings
for the OS itself (see below).
======

So I was out of luck!  Well, it can be patched more or less easily.


======
> CHCP is an internal program that calls kernel, which calls NLSFUNC.

Indirectly, yes. It calls the DOS kernel, which will usually call
down to NLSFUNC, which will then call back into DOS to retrieve the
info (for file I/O only). Once the info has been looked up, NLSFUNC
will return it to the DOS kernel, which will then again call
NLSFUNC in order to switch the codepage. NLSFUNC will then ask
any character device driver in the system if it supports codepage
switching. Any driver supporting codepage switching (like DISPLAY.SYS
or PRINTER.SYS, for example), will then be advised to switch the
codepage. If they return an error, NLSFUNC will return an error as
well. DISPLAY.SYS internally will also communicate with ANSI.SYS
and KEYB in order to switch display and keyboard codepages
(ANSI.SYS is not called for codepage switching as is, only for
communicating display properties).
=====

Very interesting topic. Well, some more issues here:

(1) The way of doing that is via int2Fh/MUX=1Ah or is it possible to
call the next driver in the chain somehow?

(2) In fact, I was expecting to reflect ALL the calls except the change
codepage (generic IOCTL) to the next driver in the chain, how can I do
it?

(3) Suppose someone install DISPLAY.SYS before ANSI.SYS. Can I expect
that ANSI (or other CON driver loaded later) will reflect to me any
call?
======
> (*) There is a case which, in my opinion, leads to inconsistency.
> DISPLAY.SYS is responsible for changing keyboard codepage too.
> Microsoft's implementation will switch the screen codepage regardless
> if KEYB managed to change codepage or not, which means that it would
> leave screen and keyboard with different codepages if KEYB failed.
> In my opinion, this is a bug.

In my opinion, too, but I would simply implement an option into
DISPLAY to control the behaviour - so the decision is up to the user.
I suggest to use /E for this purpose because it would somewhat correlate
with an option supported by my internal issue of DR-DOS NLSFUNC:
=====

Ok, annotated in the TODO list ;-))

=====
> This program is not required. It required only if you wish to
> switch codepages "on the fly", but if you work only with one
> codepage, you may (should) initialize it with COUNTR= statement.
> Of course, in MS-DOS MODE and KEYB without NLSFUNC loaded will
> fail to load fonts/layouts other than pointed in COUNTRY=.

This is correct, but still, a COUNTRY.SYS file parser is needed
not only for NLSFUNC, but also for FreeDOS' DOS BIOS.

In older issues of DR DOS, NLSFUNC has been an integral part
of the kernel, and the disk file was only a dummy for programs
expecting it to be there. But then the code moved into the
DOS BIOS, where it will get discarded after init (the driver
is temporarily linked in during the processing of the COUNTRY
directive), and into the file parsing portion of the external
NLSFUNC driver for later use.
=====

We could have at least a very basic parser that would simply read the
information in the formats defined by Steffen as chunks in a single
file. If someone manages to cut on 1 half the Earth spinning rate, so
that a day has 48 hours, I'd compromise to do that myself.

((I have removed it when you mentioned that older currencies are not
longer in use))

There seems to be a symbol (I think in CP437) for Spanish PESETA, that I
have seen nowhere used. Most people used Pts. for Pesetas (which is
deadly WRONG, because that stand for Puntos=Score), but Ptas. (the
correct one). But appart from that, I guess that there could be users of
old PCs or PC software in the other countries (say France) where they
might want to use French Franc (for some unknown reason??).

Aitor

----------
list options/archives/etc.: http://www.topica.com/lists/fd-dev
unsubscribe: send blank email to: [EMAIL PROTECTED]

==^================================================================
This email was sent to: archive@mail-archive.com

EASY UNSUBSCRIBE click here: http://topica.com/u/?bz8Rv5.bbRv4l.YXJjaGl2
Or send an email to: [EMAIL PROTECTED]

T O P I C A -- Register now to manage your mail!
http://www.topica.com/partner/tag02/register
==^================================================================

--- Begin Message ---

On 2002-11-20, Aitor Santamaría Merino wrote:

> Are Windows charsets 8-bit codepages?
> If this is so, we could prepare (I don't know how difficult
> it would be) those codepages to be used with DISPLAY.

Such solutions already exist, although they are the result of
patching existing .CPI files. In fact, I'm proposing to introduce
this for many years now, not only for Windows Code Pages, but also
for Macintosh and ISO Code Pages, so you can view foreign files
without necessary conversion...

Actually, Michal and I are working on creating new .CPI files
from scratch (to be used under *any* system supporting .CPI files,
including DR-DOS, PTS-DOS, MS-DOS OEM issues, Arabic/Hebrew issues
of MS-DOS, OS/2 and Windows NT/2000/XP), so you can include and
exclude codepages as you like. Switching between the various .CPI
file formats will also be just a matter of setting a different
conditional define.

This is very slow work, though, as Michal and I are extremely
busy with other duties...

So volunteers are welcome, for example to create more mapping tables,
compare codepages, collect character shapes. All this does not
require any real programming knowledge (a bit background on NLS
issues will help, though), but beware, it is still very time-
consuming and detail-oriented, and therefore exhausting work.
It would require someone who is not after a quick hack, but a
perfect solution. There's no point in having only 99.9% correct
Code Page definitions or such. If we could refer this work to
someone else, we could better concentrate on writing the actual
code, Michal his font editor and me the skeleton macro assembler
sources for the CPI files themselves. Much of it is already there.)

> There's something that I would need to know for KEYB to handle
> this easily: which is the highest codepage number known?

May I refer you to the huge KBD.LST file containing all the keyboard
related news for the forthcoming issue of RBIL? I already sent you
a copy... For your convenience, here's one of the tables to be found
under INT 21h/AX=AD80h:

[Sorry for the extra long lines in this post]

|KEYB.COM keyboard layout IDs:
|
| ID: Code: Sub: Code pages:          Source: Description:                 Euro:
| ---  SG*    -  (2)                      [A]                                  -
|   0* SG*    1  850,437        [BCDEGHJLMNO] Switzerland (German)             E
| ---  CF*    -  (2)                      [A]                                  -
|  58* CF*    1  863,850           [BCEHIJKO] Canada (French)                  E
|                                             J: old
|                                             K: CAN/CSA-Z 243.200-92
|  58* CF*    1  850,863               [GLMN]                                  E
|  91* RU*    1  866,437,850,855,852,1251 [D] Russia                           -
|  91* RU*    1  866,437,850,855,1251     [O]                                  -
|  92* RU     2  866,437,850,855,1251     [D] Russia                           -
|  93* RU     2  866,437,850,855,1251     [D] Russia "Latin/Cyrillic"          -
|  94* UR*    1  866,437,850,855          [D] Russia                           -
|  95* UR     2  866,437,850,855          [D] Russia                           -
|  96* UR     2  866,437,850,855          [D] Russia                           -
|  97* BL*    1  866,437,850,855          [D]                                  -
|  98* BL     2  866,437,850,855          [D]                                  -
|  99* BL     2  866,437,850,855          [D]                                  -
| ---  XX*    -  (5)                      [A]                                  -
| 103* XX*    1  437,850,860,863,865                           [B]             -
| 103* XX*    1  437,850,852,860,863,865                   [CDEHJ]             -
| 103* US/XX* 1  437,850,852,860,863,865,866,855              [FO]             -
| 103* XX*    1  437,850,852,855,857,860,861,863,866,869       [G]             -
| 103* US/XX* 1  437,850,852,860,863,865,861                  [IK]             -
| 103* US/XX* 1  437,850,852,855,857,860,861,863,865,866,869 [LMN]           5,E
| 103  UX*    1  850,437                                       [N]           5,E
| 103* US/XX  1  437,850,852,860,863,865                       [P] "US-X"      -
| 103* US/XX  1  437,775,850,852,860,863,865,866,855           [R]             -
| 118* YC*    1  855,852               [GLMN]                                  -
| 118  YC*    1  855,850                  [K]                                  -
| ---  BE*    -  (2)                      [A] Belgium                          -
| 120* BE*    1  850,437       [BCDEGHIJLMNO] Belgium                          E
| 120* FR     2  437,850             [BCEHIJ] France                           -
| 120* FR     2  850,437               [GLMN] France                           E
| ---  GR*    -  (2)                      [A] Germany                          -
| 129* GR*    1  437,850          [BCDEHIJKO] Germany (DIN 2137 part 2)        -
|                                             J: Shiftlock
|                                             K: CAPSlock "GR-IBM"
| 129* GR*    1  850,437               [GLMN] Germany                          E
| ---  IT*    -  (2)                      [A]                                  -
| 141* IT*    1  437,850          [BCDEHIJKO] Italy                            -
| 141* IT*    2  850,437               [GLMN] Italy                            5
| 142* IT     2  437,850           [BCEHIJKO] Italy "IT142"                    -
| 142* IT     2  850,437               [GLMN] Italy                            5
| ---  NL*    -  (2)                      [A]                                  -
| 143* NL*    1  437,850           [BCDEHIJO] Netherlands                      -
| 143* NL*    1  850,437               [GLMN] Netherlands                      E
| 146  IT                                     Italy "IT146"                    ?
| ---  SF*    -  (2)                      [A]                                  -
| 150* SF*    1  850,437        [BCDEGHJLMNO] Switzerland (French)             E
| ---  SV*    -  (2)                      [A] Sweden                           -
| 153* SV*    1  437,850          [BCDEHIJKO] Sweden                           -
| 153* SV*    1  850,437               [GLMN] Sweden                           5
| 153  SU*    1  850,437      [BCDEGHIJKLMNO] Finland (Suomi)                5,E
| ---  NO*    -  (2)                      [A] Norway                           -
| 155* NO*    1  850,865         [BCEHIJKMNO] Norway                           5
| 155* NO*    1  850                     [GL]                                  5
| ---  DK*    -  (2)                      [A]                                  -
| 159* DK*    1  850,865        [BCDEHIJLMNO] Denmark                          5
| 159* DK*    1  850                      [G]                                  -
| 161* IS*    1  861,850                [IKO] Iceland                          -
| ---  PO*    -  (2)                      [A]                                  -
| 163* PO*    1  850,860        [BCEGHIJLMNO] Portugal                         5
| ---  UK*    -  (2)                      [A]                                  -
| 166* UK*    1  437,850          [BCDEHIJKO] United Kingdom                   -
| 166* UK*    1  850,437               [GLMN]                                  4
| 168* UK     2  437,850            [BCEHIJK] United Kingdom "UK168"           E
| ---  LA*    -  (2)                      [A]                                  -
| 171* LA*    1  850,437        [BCEGHIJLMNO] Latin America                    E
| ---  SP*    -  (2)                      [A]                                  -
| 172* SP*    1  850,437       [BCDEGHIJLMNO] Spain                          5,E
| 179* TR     2  857,850               [GLMN] Turkey                           E
| 179* TR     2  850,857                [IKO]                                  -
| ---  FR*    -  (2)                      [A]                                  -
| 189* FR*    1  437,850           [BCDEHIJO] France                           -
| 189* FR*    2  850,437               [GLMN] France                           E
| 190  TH                                    "TH190"
| 194  JP*    1  932,437               [CEHJ] Japan                            -
| 194* JP*    1  437,932               [GLMN]                                  -
| 194  JP*    1  437,932                 [OP]                                  -
| 194  AX*    1  437,932                  [M]                                  -
| 194  J3*    1  437,932                  [M]                                  -
| 196  AX*    1  437,932                  [P]
| 196  US*    1  437,932                  [P]
| 197* IC*    1  850,861               [GLMN] Iceland                          5
| 208* HU*    1  850,852            [CDEHJKO] Hungary                          -
| 208* HU*    1  852,850                  [G]                                  -
| 208  HU*    1  850,852                  [I]                                  -
| 208* HU*    1  852,912,850            [LMN]                                  -
| 210* LT        770,773,775,437          [Q] Lithuania (QWERTY EDV)           4
| 210* LT        775                      [R]
| 211* LT*       770,773,775,437          [Q] Lithuania (,A"ZERTY)             E
| 211* LT*       775                      [R]
| 212* LT        770,773,775,437          [Q] Lithuania (QWERTY Baltic)        E
| 212* LT        775                      [R]
| 214* PL*    1  850,852           [CDEHIJKO] Poland                           -
| 214* PL*    1  852,850                  [G]                                  -
| 214* PL*    1  852,912,850            [LMN]                                 5?
| 220  EL                                    "EL220"
| 234* YU*    1  850,852            [CDEHJKO] Yugoslavia (Latin)               -
| 234* YU*    1  852,850                  [G]                                  -
| 234  YU*    1  850,852                  [I]                                  -
| 234* YU*    1  852,912,850            [LMN]                                  -
| 234  SI*    1  852,912,850            [LMN] Slovenia                         -
| 234  BL*    1  852,912,850            [LMN] Bosnia/Herzegovina (Latin)       -
| 234  HR*    1  852,912,850            [LMN] Croatia                          -
| 241* BG     2  855,850               [GLMN] Bulgaria                         -
| 243* CZ*    1  850,852            [CDEHJKO] Czech Republic                   -
| 243* CZ*    1  852,850                  [G]                                  -
| 243  CZ*    1  850,852                  [I]                                  -
| 243* CZ*    1  852,850,912            [LMN]                                  -
| 245* SL*    1  850,852            [CDEHJKO]                                  -
| 245* SL*    1  852,850                  [G]                                  -
| 245  SL*    1  850,852                  [I]                                  -
| 245* SL*    1  852,912,850            [LMN]                                  -
| 258  IS
| 259  EL
| 274* BR*    1  850,437              [CEHJO] Brazil                           -
| 274* BR     2  850,437               [GLMN]                                  -
| 274* BR*    2  850,437                 [IK]                                  -
| 275* BR*    2  850,437               [GLMN] "BR-2"                           -
| 275* BR     1  850,437                 [IK]                                  -
| 319* GK*    1  869,850               [GLMN] Greece                           -
| 319* GK*    1  869,737                 [IO]                                  -
| 319* GK*    1  869,737,850              [K]                                  -
| 333  RO*    1  850,852                [IKO] Romania                          -
| 341* RU     2  866,437,850,855,1251     [D] Russia (Latin/Cyrillic)          -
| 425* ET*    1  775,850                  [O]                                  -
| 440* TR*    2  857,850               [GLMN] Turkey "TR440"                   E
| 440* TR*    1  850,857                [IKO] Turkey                           -
| 441* RU     3  866,855,850           [GLMN] Russia "RU441"                   -
| 441  RU*    1  866,850,855              [K]                                  -
| 442* BG*    2  855,850                  [G] Bulgaria                         -
| 442* BG*    2  866,850,855              [K]                                  -
| 442* BG*    2  915,855,850            [LMN]                                  -
| 442* BG*    1  866,850,855              [O]                                  -
| 443  RU                                     Russia (documention only!)       -
| 446* RO*    1  852,850                  [G] Romania                          -
| 446* RO*    1  852,912,850            [LMN] Romania                          -
| 448* AL*    1  852,850                  [G]                                  -
| 449* MK*    1  915,855,852            [LMN] FYR of Macodonia                 -
| 450  BC*    1  915,855,852            [LMN] Bosnia/Herzegovia (Cyrillic)     -
| 450* SB*    1  915,855,852            [LMN] Serbia/Montenegria               -
| 452* AL*    1  850,437                [LMN] Albania                          -
| 453* GR     1  850,437                [LMN] Germany (DIN 2137) "DE"/"DE453"  E
| 457  PL                                     "PL457"                         5?
| 458  IS                                     Iceland "IS458"                  5
| 459  EL                                     "EL459"
| 470  AA                                     "AA470"
| 985* DV*    1  437,850                 [FO] Dvorak "USDV"                    -
| 986* LH*    1  437,850                  [F] Left handed Dvorak "USDVL"       -
| 987* RH*    1  437,850                  [F] Right handed Dvorak "USDVR"      -
|
| [A] MS-DOS 3.30 KEYBOARD.SYS
| [B] PC DOS 4.00, MS-DOS 4.01 KEYBOARD.SYS
| [C] MS-DOS 5.0 KEYBOARD.SYS
| [D] Russian MS-DOS 5.0 KEYBOARD.SYS
| [E] MS-DOS 6.0 KEYBOARD.SYS
| [F] MS-DOS 6.0, MS-DOS 6.20, MS-DOS 6.22 DVORAK.SYS
| [G] PC DOS 6.1 KEYBOARD.SYS
| [H] MS-DOS 6.20 KEYBOARD.SYS
| [I] MS-DOS 6.20 KEYBRD2.SYS
| [J] MS-DOS 6.22, Chinese MS-DOS 6.22, MS-DOS 7.10 (Windows 95 OPK3, 98,
|     98SE / 98ZA), MS-DOS 8.0 (Windows ME) KEYBOARD.SYS
| [K] MS-DOS 6.22, Chinese MS-DOS 6.22, MS-DOS 7.10 (Windows 95 OPK3, 98,
|     98SE / 98ZA), MS-DOS 8.0 (Windows ME) KEYBRD2.SYS
| [L] PC DOS 7 KEYBOARD.SYS
| [M] PC DOS/V 7 KEYBOARD.SYS
| [N] PC DOS 2000 KEYBOARD.SYS
| [O] Windows 2000 (NT5) KEYBOARD.SYS and KEY01.SYS
| [P] Japanese MS-DOS 6.20 JKEYBRD.SYS
| [Q] KADA W98LT 4.16 KEYBRDL.SYS
| [R] KADA WNLT 4.13 for NT4 KEYBRDL.SYS
|
|Notes:  According to documentation PC DOS 7-2000 no longer support the
|          UK /ID:168 keyboard layout, but the KEYBOARD.SYS files still
|          contain entries for it.
|        Since some layout definitions are only used internally, a star (*)
|          indicates that the layout is actually addressable under this
|          ID or name
|        The Code Pages are given in the priority the have in the
|          corresponding KEYBOARD.SYS file.

I hope this can answer several of your's and Henrique's questions.

> ( my wish: below 4000
> my second wish: below 8000
> my last wish: below 16000 :-((()

Country codes, Code Page IDs, and Keyboard Layout IDs are 16-bit values
and should be treated as such. Although far not all of them were or are
used by Microsoft and IBM, the highest assignable Code Page number is
65533. 0, 65534 and 65535 are reserved as they have special meanings
for the OS itself (see below).

AFAIR, Microsoft's MODE does not accept Code Page numbers higher than
999 (only a question of command line parsing, no technical limitation).
DR-DOS MODE displays high Code Page numbers as negative numbers due to
a signed/unsigned oversight (only a cosmetical issue).

But this does not mean they cannot exist - actually, there are *many*
(hundreds!) Code Pages with much higher values. Please have a look
at the huge table in CODEPAGE.LST which I already sent you as well.
I wonder a bit, why I spent so much time collecting all this info
and maintaining these lists, when apparently noone reads them...

> CHCP is an internal program that calls kernel, which calls NLSFUNC.

Indirectly, yes. It calls the DOS kernel, which will usually call
down to NLSFUNC, which will then call back into DOS to retrieve the
info (for file I/O only). Once the info has been looked up, NLSFUNC
will return it to the DOS kernel, which will then again call
NLSFUNC in order to switch the codepage. NLSFUNC will then ask
any character device driver in the system if it supports codepage
switching. Any driver supporting codepage switching (like DISPLAY.SYS
or PRINTER.SYS, for example), will then be advised to switch the
codepage. If they return an error, NLSFUNC will return an error as
well. DISPLAY.SYS internally will also communicate with ANSI.SYS
and KEYB in order to switch display and keyboard codepages
(ANSI.SYS is not called for codepage switching as is, only for
communicating display properties).

> If I am not wrong, NLSFUNC would care of all that (including
> DISPLAY), and change all of that in a consistent manner*.
> The problem is that we do not have a NLSFUNC program :-(((

Yes.

> (*) There is a case which, in my opinion, leads to inconsistency.
> DISPLAY.SYS is responsible for changing keyboard codepage too.
> Microsoft's implementation will switch the screen codepage regardless
> if KEYB managed to change codepage or not, which means that it would
> leave screen and keyboard with different codepages if KEYB failed.
> In my opinion, this is a bug.

In my opinion, too, but I would simply implement an option into
DISPLAY to control the behaviour - so the decision is up to the user.
I suggest to use /E for this purpose because it would somewhat correlate
with an option supported by my internal issue of DR-DOS NLSFUNC:

|NLSFUNC R4.07 (001014)  National Language Support
|Copyright (c) 1988,1998 Caldera, Inc.  All rights reserved.
|Copyright (c) 1997,2000 Matthias Paul. All rights reserved.
|
|NLSFUNC [[d:]path] [/Help] [/B][/E][/F] [/MH|/MU|/ML|/L|/NOHMA] [/N][/V][/X]
|
|  d:path        Filespec of local COUNTRY.SYS database (default: system file)
|  /B            Search both, local and system NLS databases for requested data
|  /E            Do not report device driver code page switching errors
|  /F            Override warnings and force NLSFUNC to load or update filespec
|  /MH           Load and relocate NLSFUNC into High Memory (HMA)
|  /MU           Load and relocate NLSFUNC into XMS Upper Memory (XMSUMB)
|  /ML           Load NLSFUNC as classical TSR (Conventional or Upper Memory)
|  /L or /NOHMA  Similar to /ML, but prohibit relocation into High Memory (HMA)
|  /N            Do not bypass NLSFUNC on detection of SHIFT+CTRL+ALT hotkey
|  /V            Display verbose messages (default: warnings and errors) (ALT)
|  /X            Always load advanced COUNTRY.SYS file support
|
|Installing NLSFUNC without giving any of the /Mx switches will try to relocate
|it into High Memory or XMS Upper Memory, or to load it as a classical TSR. Use
|of a combination of /Mx switches will override the default. Use of the HILOAD
|NLSFUNC /L syntax will try to load NLSFUNC into Upper Memory (UMB) as a TSR.

On 2002-11-20, Arkady V.Belousov wrote: 

> This program is not required. It required only if you wish to
> switch codepages "on the fly", but if you work only with one
> codepage, you may (should) initialize it with COUNTR= statement.
> Of course, in MS-DOS MODE and KEYB without NLSFUNC loaded will
> fail to load fonts/layouts other than pointed in COUNTRY=.

This is correct, but still, a COUNTRY.SYS file parser is needed
not only for NLSFUNC, but also for FreeDOS' DOS BIOS.

In older issues of DR DOS, NLSFUNC has been an integral part
of the kernel, and the disk file was only a dummy for programs
expecting it to be there. But then the code moved into the
DOS BIOS, where it will get discarded after init (the driver
is temporarily linked in during the processing of the COUNTRY
directive), and into the file parsing portion of the external
NLSFUNC driver for later use.

On 2002-11-21, Axel C. Frinke wrote:

> I've heard of a proposal about 'user definable codepage IDs' to
> assign IDs above 0xF000 to codepages without official IDs. But
> I don't like to assign such a number to a wide-spread codepage
> like KOI8-R.

According to IBM's Character Data Representation Architecture
(CDRA level 2) there are two special areas within the 16-bit
Code Page ID space for variations of existing Code Pages and user
or OEM definable Code Pages. This is exactly the way to go until
IBM would assign an offical ID for a new Code Page. Everything
else undermines the system, which I think, is a bad idea, even
though there are quite a large number of Code Pages which have
been assigned without first checking with IBM and there are
still many Code Pages not having official Code Page IDs, yet.

That's also, why I withdraw my proposed Code Page ID for the
new variant of Code Page 850 with Euro under ID "8501" (which
I issued before I knew about IBM CDRA). This codepage is now
called CP 858 officially.

>From my NECPINW.CPI docs:

| My previous proposal for this EURO SIGN-variant of Code Page 850
| was Code Page 8501, while the IBM CDRA level 2 standard reserves
| the range E000h..EFFFh for user definable CCSIDs (that is,
| "Code Pages" here). NECPINW.CPI can still provide the Code Page
| under both IDs (EURO_8501 conditional), but IBM has meanwhile
| assigned ID 858, making my previous proposal obsolete. Hence,
| support for 8501/58194 may vanish, use 858 instead.

and

| The IBM CDRA level 2 standard reserves Code Page IDs FF00h..FFFEh
| for user definable "private use" assignments.

This means, Code Pages in the FF00h..FFFEh (or better FFFDh) range may
vary completely from user to user, device to device, and/or manufacturer
to manufacturer. So, switching to them via CHCP does not necessary
create reasonable results depending on circumstances. Switching to
them via MODE dev: CODEPAGE SELECT=nnnnn will still work fine, as
you can select different Code Pages for different devices then.

But if you want to assign something "new" or "special", this is the
range to use, and you are completely free in using this space as
you like and can even create your own bit patterns within that range.
By definitionem, these assignments are private, so it is no problem,
if different people assign different Code Pages to identical IDs.

The range E000h..EFFFh is used for varitions of existing Code Pages,
and if possible should be assigned so that the LSBs are still
matching the parent Code Page. That's why NECPINW.CPI also supported
the new variant of Code Page 850 with Euro sign under ID 58194
(CP 850 = 0352h, CP 58194 = E352h).

On 2002-11-21, Arkady V.Belousov wrote:

> Subject: Re: [fd-dev] ISO-Latin and 4-digit codepages; arabic cp720

>> Let me express it more precisely: it would be handy to have all
>> codepage number below 4098.
>
> As stated by Matthias, DR-DOS assigns for code pages with euro
> sign some very big values.

My NECPINW.CPI does, DR-DOS does not.

However, DR-DOS /does/ support /Country Codes/ much larger than 999
in order to support entries with the ISO 8601 international date
format and/or Euro currency. This is a proprietary extension of
DR-DOS. Since I have already explained the patterns (a MOD 1000 system)
and range definitions for this scheme, I won't go into the details here.

Axel C. Frinke wrote:

> Well, with the assumption that all codepage IDs would not take more
> than 10 bits, there would be 6 remaining bits to denote variations of
> the existing codepages. If I remember correct, the DR-DOS method
> denotes codepage 858 as 20850.

No. DR-DOS does not support codepages with Euro sign (yet) although,
somewhat ironically, the symbol is part of the font database.

Nothing new to you, Axel, but maybe still an interesting bit of trivia
for the others:

The Euro currency support in DR-DOS 7.02+ pre-dates the Euro currency
support in IBM PC DOS 2000 by several months (IIRC, I implemented it
in 1997-11) and it was still not completely clear where to introduce
the character on the keyboard layout. I had tried to discuss the matter
and find a solution with several keyboard vendors beforehand, but
back in 1996 - 1997 they all said, we'll wait and see what Microsoft
will do... ;->

Code Page 858 "as is" was not defined at that time. Maybe I was
uninformed, but I did not heard about this ID before fall 2000.

Looking back, the first reference I can now find about it is dated
1998-04-30 (PC DOS 2000 files), and looking this up in my records,
the Euro sign was added to the DR-DOS font database on 1998-05-01
on my behalf. I had read about it in magazine articles a few weeks
earlier and only learnt about PC DOS 2000 a few months afterwards. ;-)

Still, the Euro variant of the codepage, which IBM introduced in
PC DOS 2000 (somewhat incompatibly) resides under the ID 850, not 858.

The Euro currency support in DR-DOS is bound to using alternative
country codes, which, I think, was a bit cumbersome but reasonable
at the time, because during the transitional phase of the European
Monetary Union (EMU), you had to easily switch back and forth between
the local currency and the forthcoming Euro all the time, so the
old and the new country codes could be easly retrieved by adding
or subtracting multiples of 1000 from the current value.
This gave several possible values, not all of which have actually
been used. So, you have values with Euro sign, with international
date format according to ISO 8601, and with both, depending on
personal preferences or local standards (for example, the corresponding
DIN EN 28601 is mandantory in Germany since 1996-05-01, although most
people still use the old 1.5.1996 data format).

If you selected an Euro-enabled country code under DR-DOS 7.02, the
currency was still displayed as "(=" (under PC DOS 2000 still "DM",
BTW). A few months earlier, I had searched the web and asked in several
German financial institutions, what the official abbreviation for
the forthcoming Euro would be, but at this time, they still couldn't
give me a definite answer (I guess at least some of them already
knew what they would use at this time, but didn't want to make a
formal statement), so instead of using "EUR" and risking to introduce
a wrong string in the end, I used "(=" instead. Short before the
release of DR-DOS 7.02 I received the definite answer that "EUR"
would be used, but it took some more months before the immediately
updated COUNTRY.SYS file became public with DR-DOS 7.03, unfortunately.

Today, this system of doubled country codes is obsolete, and the old
entries could be updated as the old currencies are no longer in use.

> For automatted processing it is much easier to subtract 20000 than
> looking up in an additional lookup table for variants of codepages.

As explained, DR-DOS uses a similar system for Country Codes, but not
for Code Pages. Still IBM CDRA reserves the E000h..EFFFh codepage range
for a very similar purpose.

>From my NECPINW.CPI docs (just to give an example, not representative,
and of course, by far not a complete list of Code Pages - not all of
them are even defined in CDRA level 2):

| 00D2h   210  Greek
| 016Fh   367  7-bit ISO 646 (US)
| 01B5h   437  International, USA, IBM-2, PC-8, World Trade
| 029Bh   667  Polish (Mazovia) (=CP 991)
| 02E1h   737  Greek
| 0352h   850  Multilingual, Latin I
| 0354h   852  Slavic, Eastern Europe (Latin II)
| 0355h   853  Turkish (Latin II)
| 0357h   855  Cyrillic I
| 0359h   857  Turkish (=CP 58201)
| 035Ah   858  Multilingual, Latin I with EURO SIGN
| 035Ch   860  Portuguese
| 035Fh   863  French Canadian
| 0361h   865  Nordic, Norway II, Danish
| 0362h   866  Russian, Cyrillic II
| 0363h   867  Czech (Kamenicky) (=CP 895)
| 037Fh   895  Czech (Kamenicky) (=CP 867)
| 03DFh   991  Polish (Mazovia) (=CP 667)
|         999  Dummy placeholder for hardware Code Page
| [...]
| 2135h  8501  (Multilingual, Latin I with EURO SIGN)
| E352h 58194  "" (=CP 8501)
| E359h 58201  Turkish with EURO SIGN at D5h (=CP 857)
| [...]
| E5B5h 58805  CP 437 variant with EURO SIGN at 9Fh
| E69Bh 59035  CP 667 variant with EURO SIGN at 9Fh (=CP 59359)
| E752h 59218  CP 850 variant with EURO SIGN at 9Fh
| E75Fh 59231  CP 863 variant with EURO SIGN at 9Fh
| E761h 59233  CP 865 variant with EURO SIGN at 9Fh
| E7DFh 59359  CP 991 variant with EURO SIGN at 9Fh (=CP 59035)
| [...]

Again, 8501 is meanwhile withdrawn and will no longer be supported
in future issues of NECPINW.CPI.

Please note, that the names of Microsoft's "Latin" codepages use
Roman digits (I, II, III) rather than Arabic digits (1, 2, 3) to
distinguish them from ISO codepages, which /do/ use Arabic digits.

On 2002-11-21, Oleg Deribas wrote:

> I don't know is it official or not, but KOI8-R have it's own codepage
> number. In IBM OS/2 it is known as CP878.

Very interesting, as it just fills a gap in my CODEPAGE.LST file: :-)

| Index CCSID CPGID/ ES/   CS/    F/M/S  Name & Comments
|              CP   ESID  GCSGID
| (hex) (dec) (dec) (hex) (dec)   (dec)
| 0000h   -       0   -     -       -    Reduced 7-bit ASCII
|                                        (cannot be directly accessed by DOS)
| 0000h   -       0   -     -       -    (internally reserved by DR DOS)
| 0000h   -   00000                      reserved for special purposes
| 0000h 00000   -     -     -       -    "Inheritance from a higher level"
| [...]
| 036Bh 00875 00875 1100h 00925 M(00184) IO/Group 1a: EBCDIC: Greek
|       00878                            ??? [OS/2 Warp 3 FixPak 40]
| 0370h 00880 00880 1100h 00960 F(00190) CM/Group 1a: Cyrillic Multilingual
|               880                      Russian (Cyrillic GOST)
|                                        EBCDIC: Cyrillic
|                                        Names (RFC1345): "IBM880", "cp880",
|                                        "EBCDIC-Cyrillic"
|                                        (SeeAlso: CCSID 04976)
| [...]

For comparison purposes, can you provide a full encoding vector for what
IBM implements in CP 878 (preferably in Unicode notation)?

>>> BTW, to co mplicate case the more, there is another KOI8 -
>>> KOI8-U (Ukrainian KOI8). You may see the differences with KOI8-R
>>> in RFC2319. 
>> I will take a look at it by chance. Thanks.
>
> And there is official Ukrainian DOS codepage - CP1125. It is similar
> to Russian CP866, but contains all Ukrainian characters.
> BTW, in Epson printers CP1125 called CP866-Ukr for some reason ;)

| [...]
| 0464h 01124 01124 4100h 01326 F(00190) CM/Group 1a: Cyrillic Ukraine 8-Bit
|       01125                            ??? [OS/2 Warp 3 FixPak 40]
|        1129                            SBCS: Vietnamese [IBM PC]
|       01131                            ??? [OS/2 Warp 3 FixPak 40]
|             01132                      EBCDIC: Laotian [IBM, Unicode proposal
|                                                                       1998-05]
|             01133                      SBCS: ASCII Laotian (ISO-8 based) [IBM,
|                                                       Unicode proposal 1998-05]
| [...]

Yet another match, it seems. Thanks! :-)

In regard to the areas E000h..EFFFh and FF00h..FFFEh, another excerpt of the
end of CODEPAGE.LST:

| [...]
| C1B5h 49589 00437 3100h 00980 S(00097) CM/Group 1: PC Display; United Kingdom
| C1F4h 49652 00500 1100h 01114 S(00160) CM/Group 1: Belgium
| D1F4h 53748 00500 1100h 00103 S(00094) CM/Group 1: International DP94
|       57344..61439 var.  var.   var.   CCSID: reserved for private/customer use
|       61440..61695 var.  var.   var.   CCSID: reserved for future allocation by CDRA
|       61696..61951 var.  var.   var.   CCSID: reserved for Global Use CCSIDs
| F100h 61696 00500 1100h 00640 S(00081) Global Use: Syntactic CS in SBCS EBCDIC
|                                        (CP 00500 is used in the CDRA CCSID registry.
|                                        Any other CP, such as 00037, that has an
|                                        associated ESID 1100h and respects the
|                                        invariance for CS 00640, may also be used.)
| F101h 61697 00850 2100h 00640 S(00081) Global Use: Syntactic CS in SBCS PC Data
| F102h 61698 00850 3100h 00640 S(00081) Global Use: Syntactic CS in SBCS PC Display
| F103h 61699 00819 4100h 00640 S(00081) Global Use: Syntactic CS in SBCS ISO-8
| F104h 61700 00367 5100h 00640 S(00081) Global Use: Syntactic CS in SBCS ISO-7
| F10Eh 61710 00819 4100h 01274 S(00073) Global Use: Dual case printable graphics of
|                                        ASN.1 in SBCS ISO-8; it includes: A to Z,
|                                        a to z, 0 to 9, and + = ' ( ) , - . / : ?
|                                        This CCSID corresponds to ASN.1 (ISO 8824)
|                                        "Printable String" and its encoding in SBCS
|                                        ISO-7 and ISO-8 codes.
| F10Fh 61711 00500 1100h 01274 S(00073) Global Use: Dual case printable graphics of
|                                        ASN.1 in SBCS EBCDIC; it includes: A to Z,
|                                        a to z, 0 to 9, and + = ' ( ) , - . / : ?
|                                        This CCSID corresponds to ASN.1 (ISO 8824)
|                                        "Printable String" characters encoded in
|                                        SBCS EBCDIC codes.
|                                        (CP 00500 is used in the CDRA CCSID registry.
|                                        Any other CP, such as 00037, that has an
|                                        associated ESID 1100h and respects the
|                                        invariance for CS 01274, may also be used.)
| F110h 61712 00500 1100h 01134 S(00036) Global Use: SNA character set, type AR
|                                        (A to Z, and 0 to 9).
|                                        (CP 00500 is used in the CDRA CCSID registry.
|                                        Any other CP, such as 00037, that has an
|                                        associated ESID 1100h and respects the
|                                        invariance for CS 01134, may also be used.)
|       61952..62207  -     -      -     CCSID: reserved for Request for Price 
|Quotation RPQ)
|       62208..65533  -     -      -     CCSID: reserved for future allocation by CDRA
|       65024..65279  -     -      -     CPGID/CP: reserved for Request for Price 
|Quotation (RPQ)
|       65280..65534  -     -      -     CPGID/CP: reserved for customer use
|         -   65400                      reserved for Glyphes [IBM OS/2]
| FFFEh   -   65534   -     -      -     (internally reserved by DR DOS)
| FFFEh 65534   -     -     -      -     "Inheritance from a lower level"
| FFFFh   -   65535   -     -      -     (internally reserved by DOS and DR DOS)
| FFFFh   -   65535   -     -      -     reserved for special purposes
| FFFFh 65535   -     -     -      -     "CCSID not applicable"

Hope it helps,

 Matthias

-- 
<mailto:[EMAIL PROTECTED]>; <mailto:[EMAIL PROTECTED]>
http://www.uni-bonn.de/~uzs180/mpdokeng.html; http://mpaul.drdos.org

"Programs are poems for computers."

----------
list options/archives/etc.: http://www.topica.com/lists/fd-dev
unsubscribe: send blank email to: [EMAIL PROTECTED]

--- End Message ---

Re: RE: [fd-dev] Codepage IDs

Reply via email to