Re: [Cjk] CJKvert and horizontal dash transformation?

Werner LEMBERG Wed, 25 Jul 2007 10:21:10 -0700

> 1. I use c42min.fd and JISdnp.enc to work out what the original JIS
>    point was from the DNP symbol subfont glyph position. I know the
>    JIS is encoded in EUC (or DNP) not in JIS encoding, but I don't
>    know exactly which EUC. BTW, EUC is also known as UJIS, that is,
>    Unixized JIS.


Hmm.  It is *much* easier if you use DNP.sfd instead IMHO.

> 2. I could not understand yet exactly which EUC is used, so I
>    assumed the complete two-byte format EUC for now, based on the
>    fact that the first byte values seem to match the example you
>    gave me (A1A1, assumed to be unbreaking space). This form is
>    apparently not commonly encountered, according to my old
>    reference. How times have changed :-)

There is only a single EUC!  This is an encoding template, to be
filled with proper character sets; thus we have EUC-JP, EUC-KR,
EUC-CN, EUC-TW, and probably others.  In the CJK package, `EUC-JP.enc'
and `EUC-TW.enc' use so-called `single shift escape sequences' (three-
and four-byte sequences starting with character code 0x8E) to access
other character sets than JIS X 0208 and the first plane of
CNS 11643-1992, respectively.  We can ignore this here -- you might
read the ISO 2022 specification for the nasty details of single
shifts.  A simplified variant of EUC-JP without single shifts (using
only two-byte characters in the range 0xA1A1-0xFEFE) which can also be
used for EUC-KR and EUC-CN (which don't use single shifts at all) is
in `standard.enc'.

However, `standard.enc' assumes a certain TeX subfont layout (the 94
encoding planes with 94 characters each as tightly packed as possible
into the 256 glyphs slots available per subfont) which the DNP fonts
don't obey.

> 3. Now, I realize that the conversion of subtracting -160 from the
>    second byte, and making the first byte A1, is in fact the KUTEN
>    <-> JIS conversion!

No, it isn't.  Have a look at the first lines from DNP.sfd:

  sy         1: 0xA1A1_0xA1FE 101: 0xA2A1_0xA2FE
  roma      33: 0xA3A1_0xA3FE

The subfont called `sy' maps input characters 0xA1A1-0xA1FE to glyph
indices 1-94 (glyph index 0 is unused) and 0xA2A1-0xA2FE to 101-194.
The `roma' subfont maps 0xA3A1-0xA3FE to 33-127.

>    That is, whatever the DNP coding might be elsewhere (I did not
>    check), it appears to be the KUTEN index encoding for at least
>    the sy subfont.

As you can see above, the `sy' subfont holds two planes (0xA1 and
0xA2); the offset for the first plane is -160, while it is -60 for the
second.  Your further conclusions are thus incorrect.

> 4. Thus armed, I wrote a set of shell scripts to take the c42min.fd
>    file as input and output the JIS, KUTEN, and EUC decimal and
>    hexadecimal points, and also the mapping to UTF-8.  The
>    JIS0208.txt file from ftp.unicode.org has a JIS encoding column,
>    and no EUC.  Tips on how to do this process more easily much
>    appreciated.

I told you already: Look how makeuniwada.pl is working.  It reads the
files DNP.sfd and JIS0208.txt (and JIS0212.txt which you can ignore
for the intended purpose).

Nevertheless, since only characters from the range 0xA1A1-0xA1FE
appear in the Wadalab FDX files, it seems that you've done it right
:-) Thanks a lot!

> 6. I tested this using the long hyphen, and it works (character
>    30/252), so I am convinced from a practical standpoint. However,
>    looking at the CJKvert.sty I am confused, since \CJKsymbolsimple
>    takes only one argument, and I do not understand how it
>    differentiates between different subfonts from line to line.

CJK.sty makes the first characters of the two-byte JIS encoding active
(0xA1-0xFE); the associated macros automatically select the correct
subfont (based on the data in the ENC files).  If the subfont which
holds the vertical representation form is a different one, you have to
call \selectfont explicitly (as it is done, for example, in
`c70bkai.fdx', which uses a subfont called `v').

> 7. If the above is right, it can be added to the gothic and maru .fdx
>    files as well, right?

Yes, please do.  After you've sent those files to me I'll include them
both into the git repository of the CJK package and the SVN repository
of TeXLive.

> 8. Next, I would like to help in creating support for half-width
>    katakana in UTF-8 encoding too. Is this at all feasible at
>    present?

What exactly do you want to achieve?  The Wadalab fonts don't contain
half-width representation forms of katakana...


    Werner

_______________________________________________
Cjk maillist  -  [email protected]
https://lists.ffii.org/mailman/listinfo/cjk

Re: [Cjk] CJKvert and horizontal dash transformation?

Reply via email to