Re: [NTG-context] Writing Japanese using ConTeXt

2003-06-15 Thread Matt Gushee
On Sun, Jun 15, 2003 at 11:03:06PM +0200, Hans Hagen wrote:

 A few questions;
 
 - How are the rules for breaking?

For a detailed explanation, you should refer to the big book. But
actually the rules are not all that difficult--probably a good deal
simpler than European languages, I'd say. The most important thing to
know is that there is a certain set of characters that may not occur at
the end of a line, and another set that may not occur at the beginning,
and I believe (it's been a while since I seriously looked at any of
this) that there are certain unbreakable pairs, but not a huge number of
them.

 - how many glyphs are there (well, i could look it up in the big cjk book)

That's rather a tricky question, and the answer depends partly on
whether you want a complete solution or an 80/20 one. You probably know
that there are two main character sets in Japanese: jis-x-0208 and
jis-x-0212 (of course, the full names are suffixed with years, but I
forget what the current versions are). The vast majority of all Japanese
text (notice I said text, *not* documents) can be written with hiragana
and katakana (50+ characters each), roman alphabet (256, I guess?), and
the kanji in jis-x-0208, of which there are about 6000.

However, it's hard to get away without using jis-x-0212. Literary terms
and probably some specialized scientific vocabulary often require it,
and most critically, geographic and personal names very often use
jis-x-0212 characters. It's common to find names whose characters are
represented in jis-x-0208, but for any given name you must use a
different glyph that is in jis-x-0212. In Japanese culture it is
unacceptable to substitute glyphs in names. An analogy in Western
languages might be: suppose you had a typesetting system that was
incapable of rendering the string sen at the end of the word. Thus,
whenever yyou encountered the names Andersen or Olsen, you would print
them as Anderson and Olson. I don't think anyone would consider that
acceptable. 

So the upshot of this is that, though jis-x-0212 glyphs make up a very
small proportion of the Japanese text that is printed (I'd guess 1-2
percent), a large proportion of documents (40-50 percent, maybe) require
one or more glyphs from that set. So that's another 8000 glyphs, if you
want to do it right.

One other point that may or may not matter is that ... I'm not sure if
this is the correct terminology, but the code points of the Japanese
character sets are arrayed in a sparse matrix (?). Each plane is
194x194, rather than 256x256. I used to know why.

-- 
Matt Gushee When a nation follows the Way,
Englewood, Colorado, USAHorses bear manure through
[EMAIL PROTECTED]   its fields;
http://www.havenrock.com/   When a nation ignores the Way,
Horses bear soldiers through
its streets.

--Lao Tzu (Peter Merel, trans.)
___
ntg-context mailing list
[EMAIL PROTECTED]
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Writing Japanese using ConTeXt

2003-06-10 Thread Matthew Huggett
Matt Gushee wrote:

What would a good sample consist of? I can probably find something.

 

Well, for starters I guess samples showing the interaction of the four 
writing scripts (I'm thinking of glyph spacing and line-breaking here; 
e.g., in the transition from native script to Romaji and back again). 
Do you know much about different heading styles?  I suppose they are 
similar to the Chinese ones depending on how traditional the text is; 
i.e., kanji or Arabic numerals, the presence of a section kanji before 
the numbering, etc.   Examples of Furigana would be good.

Matt Huggett

___
ntg-context mailing list
[EMAIL PROTECTED]
http://www.ntg.nl/mailman/listinfo/ntg-context


RE: [NTG-context] Writing Japanese using ConTeXt

2003-06-10 Thread Tim 't Hart
Hello Hans and Matt,

 Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but
 I have tried 2 or 3 times to include a Japanese TTC font directly in a
 PDFTeX document, but was never able to make it work.
 
 dunno, maybe dvipdfmx can

I don't think PDFTeX can use TTC fonts. I use PDFTeX for DVI output and use
dvipdfmx for PDF. Map files for dvipdfmx support fonts inside a TrueType
Collection. TTF2TFM also supports the extra fonts inside a TTC by using the
-f switch.

For example, msmincho.ttc contains MS-Mincho and MS-PMincho:
ttf2tfm msmincho.ttc [EMAIL PROTECTED]@  (will make TFM for MS-Mincho)
ttf2tfm msmincho.ttc -f 1 [EMAIL PROTECTED]@ (will make TFM for MS-PMincho)

The map file for dvipdfmx will then look like:
[EMAIL PROTECTED]@ Identity-H :0:msmincho.ttc  (for MS-Mincho)
[EMAIL PROTECTED]@ Identity-H :1:msmincho.ttc (for MS-PMincho)  

 Well, it can be done in stages. I think that any serious attempt to
 support Japanese in ConTeXt should encompass all common encodings. But
 I don't see anything wrong with starting out Unicode-only.
 
 in that case some range mapping should be defined; proper test files, etc

Right now I'm working on a home page which contains information about where
to find Japanese fonts and how to install them for ConTeXt/dvipdfmx. I will
also add some example files of what is already possible in ConTeXt. I'll
post the URL soon. 

My best,
Tim


___
ntg-context mailing list
[EMAIL PROTECTED]
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Writing Japanese using ConTeXt

2003-06-09 Thread Matthew Huggett
Tim 't Hart wrote:

Recently, I've made the 'unwise' decision to start studying Japanese next
year, and of course I want to keep on using ConTeXt to write my school
papers. [] So I decided to find a way to
write Japanese in ConTeXt.
First I tried using the eOmega/ConTeXt combination since I have some great
OTPs for it, but soon found out that Omega is still the TeX of the future,
in other words, not the TeX of today and extremely unstable.
Then I decided to try ConTeXt's UTF-8 support. I created the following test
 

I asked about Japanese a while back.  Hans requested more information on 
encodings, fonts, etc.  I don't know enough about these things or 
ConTeXt to know what is needed exactly.

From what I've read, unicode is not that popular in Japan itself.  The 
most common encodings here are
a) iso-2022-jp (7bit)
b) japanese-iso-8bit (a.k.a euc-japan-1990, euc-japan, euc-jp)
c) japanese-shift-jis (shift jis 8bit; common under MS Windows)
Describe Language Environment under MULE in Gnu Emacs gives some info. 
Ken Lunde of Adobe has a book or two on processing Japanese.

Typesetting Japanese could be more complicated than Chinese because of 
the concurrent use of four writing systems:
a) Kanji (Chinese Characters)
b) Hiragana (Syllabic script for representing grammatical endings and 
words for which Kanji are not commonly used.)
c)  Katakana (Syllabic script for representing foreign words, some 
scientfic words (flora, fauna), and for emphasis)
d) Romaji -- lit.  Roman Characters (Sometimes foreign languages, 
especially English, are represented in latin script)  It is more common 
than you might imagine.

I guess I need to track down a few sample documents.  I tried to turn up 
some info on Japanese typesetting rules but had no luck.

best wishes,

Matt

___
ntg-context mailing list
[EMAIL PROTECTED]
http://www.ntg.nl/mailman/listinfo/ntg-context


Re: [NTG-context] Writing Japanese using ConTeXt

2003-06-09 Thread Matt Gushee
On Mon, Jun 09, 2003 at 11:16:27PM +0900, Matthew Huggett wrote:
 
 Recently, I've made the 'unwise' decision to start studying Japanese next
 year,

Unwise? Only if you don't really want to do it, or if you are laboring
under illusions--left over from the 80s--that it will guarantee you a
lucrative and glamorous career in international trade ;-)

But anyway, I am also interested in using ConTeXt for Japanese, and
would be glad to contribute what I can to this effort.

 I asked about Japanese a while back.  Hans requested more information on 
 encodings, fonts, etc.  I don't know enough about these things or 
 ConTeXt to know what is needed exactly.

I don't know much about ConTeXt internals, but do know something about
these things, so I may be able to help. Was Hans' request on the
mailing list? If you know when it was posted, perhaps I can look it up.

 Typesetting Japanese could be more complicated than Chinese because of 
 the concurrent use of four writing systems:

On Mon, Jun 09, 2003 at 06:33:49PM +0200, Tim 't Hart wrote:
 
 Unicode wasn't that popular because Unix-like operating systems used EUC as
 encoding, and Microsoft used their own invented Shift-JIS encoding.

There were also cultural/political reasons, with perhaps a touch of Not
Invented Here syndrome. But that's a different story.

 So there
 is still a lot of digital text out there written in these encodings, and a
 lot of tools still use it. But I think that if you want to write new texts,
 using Unicode shouldn't be a problem for most users. I guess that most
 editors supporting Asian encodings also make it possible to save in UTF-8. I
 think nowadays it's easier to find a Unicode enabled editor than it is to
 find a Shift-JIS/EUC editor! (Well, on Windows anyway...).

Yes, recent Windows versions (starting with NT 4.0 in the business
series, and ... not sure ... ME? in the consumer series) use some form
of Unicode as their base encoding, so I think it is now the norm for
Windows text editors to support UTF-8 ... I'm pretty sure TextPad does,
for example.

 Since ConTeXt
 already supports UTF-8, I don't see a reason to make thinks more difficult
 than they already are by writing text in other encodings.

On the face of it that makes sense. But I don't think it's safe to make
a blanket assumption that the text in a ConTeXt document will originate
with the creator of the document, or that it will be newly written.
Also, UTF-8 support is still a bit half-baked on Unix/Linux systems.

 When I look at the source of the Chinese module, the most difficult part for
 me to understand is the part about font encoding, the enco-chi.tex file, and
 the use of \defineuclass in that file. I guess it has to do something with
 mapping the written text to the font.

Most likely. I might be able to glean something useful from that file.
I'll take a look when I can find the time.

 I guess that if you want to make a proper Japanese module, you'll need to
 support JIS or Shift-JIS encoded fonts.

This would be a good idea for Type 1 font support. It seems to me that
almost all recent Japanese TrueType fonts have a Unicode CMap.

 But on the other hand, maybe we
 don't need to support that since there are a lot of Japanese Unicode fonts
 available. I use WinXP, and there we have msmincho.ttc and msgothic.ttc,
 which are both Unicode fonts.

Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but
I have tried 2 or 3 times to include a Japanese TTC font directly in a
PDFTeX document, but was never able to make it work.

 And Cyberbit is a Unicoded font as well. Commercially available fonts by
 Dynalab (Dynafont Japanese TrueType collection is quite cheap and very good)
 are also Unicode fonts. Again, I don't think we should make it difficult for
 ourselves by trying to support non-Unicode fonts while unicoded Japanese
 fonts are easy to use and widely available.

Well, it can be done in stages. I think that any serious attempt to
support Japanese in ConTeXt should encompass all common encodings. But
I don't see anything wrong with starting out Unicode-only.

  Typesetting Japanese could be more complicated than Chinese because of
  the concurrent use of four writing systems 
 
 The fact that Japanese uses four writing systems is not really a problem.

Maybe it's not a big problem. But it is certainly more complex than
chinese, since there is a mixture of proportional and fixed-width
characters, and the presence of Kana and Romaji complicate the
line-breaking rules.

  I guess I need to track down a few sample documents.  I tried to turn up 
  some info on Japanese typesetting rules but had no luck.

What would a good sample consist of? I can probably find something.

 The only info I got is from Ken Lunde's CJKV book, where he mentions some
 rules about CJK line breaking.

Yes, Lunde is good, but he doesn't go into enough detail to serve as an
implementor's guide. I've also searched for more info on this subject;
my impression is that