Re: [NTG-context] Writing Japanese using ConTeXt
On Sun, Jun 15, 2003 at 11:03:06PM +0200, Hans Hagen wrote: A few questions; - How are the rules for breaking? For a detailed explanation, you should refer to the big book. But actually the rules are not all that difficult--probably a good deal simpler than European languages, I'd say. The most important thing to know is that there is a certain set of characters that may not occur at the end of a line, and another set that may not occur at the beginning, and I believe (it's been a while since I seriously looked at any of this) that there are certain unbreakable pairs, but not a huge number of them. - how many glyphs are there (well, i could look it up in the big cjk book) That's rather a tricky question, and the answer depends partly on whether you want a complete solution or an 80/20 one. You probably know that there are two main character sets in Japanese: jis-x-0208 and jis-x-0212 (of course, the full names are suffixed with years, but I forget what the current versions are). The vast majority of all Japanese text (notice I said text, *not* documents) can be written with hiragana and katakana (50+ characters each), roman alphabet (256, I guess?), and the kanji in jis-x-0208, of which there are about 6000. However, it's hard to get away without using jis-x-0212. Literary terms and probably some specialized scientific vocabulary often require it, and most critically, geographic and personal names very often use jis-x-0212 characters. It's common to find names whose characters are represented in jis-x-0208, but for any given name you must use a different glyph that is in jis-x-0212. In Japanese culture it is unacceptable to substitute glyphs in names. An analogy in Western languages might be: suppose you had a typesetting system that was incapable of rendering the string sen at the end of the word. Thus, whenever yyou encountered the names Andersen or Olsen, you would print them as Anderson and Olson. I don't think anyone would consider that acceptable. So the upshot of this is that, though jis-x-0212 glyphs make up a very small proportion of the Japanese text that is printed (I'd guess 1-2 percent), a large proportion of documents (40-50 percent, maybe) require one or more glyphs from that set. So that's another 8000 glyphs, if you want to do it right. One other point that may or may not matter is that ... I'm not sure if this is the correct terminology, but the code points of the Japanese character sets are arrayed in a sparse matrix (?). Each plane is 194x194, rather than 256x256. I used to know why. -- Matt Gushee When a nation follows the Way, Englewood, Colorado, USAHorses bear manure through [EMAIL PROTECTED] its fields; http://www.havenrock.com/ When a nation ignores the Way, Horses bear soldiers through its streets. --Lao Tzu (Peter Merel, trans.) ___ ntg-context mailing list [EMAIL PROTECTED] http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Writing Japanese using ConTeXt
Matt Gushee wrote: What would a good sample consist of? I can probably find something. Well, for starters I guess samples showing the interaction of the four writing scripts (I'm thinking of glyph spacing and line-breaking here; e.g., in the transition from native script to Romaji and back again). Do you know much about different heading styles? I suppose they are similar to the Chinese ones depending on how traditional the text is; i.e., kanji or Arabic numerals, the presence of a section kanji before the numbering, etc. Examples of Furigana would be good. Matt Huggett ___ ntg-context mailing list [EMAIL PROTECTED] http://www.ntg.nl/mailman/listinfo/ntg-context
RE: [NTG-context] Writing Japanese using ConTeXt
Hello Hans and Matt, Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but I have tried 2 or 3 times to include a Japanese TTC font directly in a PDFTeX document, but was never able to make it work. dunno, maybe dvipdfmx can I don't think PDFTeX can use TTC fonts. I use PDFTeX for DVI output and use dvipdfmx for PDF. Map files for dvipdfmx support fonts inside a TrueType Collection. TTF2TFM also supports the extra fonts inside a TTC by using the -f switch. For example, msmincho.ttc contains MS-Mincho and MS-PMincho: ttf2tfm msmincho.ttc [EMAIL PROTECTED]@ (will make TFM for MS-Mincho) ttf2tfm msmincho.ttc -f 1 [EMAIL PROTECTED]@ (will make TFM for MS-PMincho) The map file for dvipdfmx will then look like: [EMAIL PROTECTED]@ Identity-H :0:msmincho.ttc (for MS-Mincho) [EMAIL PROTECTED]@ Identity-H :1:msmincho.ttc (for MS-PMincho) Well, it can be done in stages. I think that any serious attempt to support Japanese in ConTeXt should encompass all common encodings. But I don't see anything wrong with starting out Unicode-only. in that case some range mapping should be defined; proper test files, etc Right now I'm working on a home page which contains information about where to find Japanese fonts and how to install them for ConTeXt/dvipdfmx. I will also add some example files of what is already possible in ConTeXt. I'll post the URL soon. My best, Tim ___ ntg-context mailing list [EMAIL PROTECTED] http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Writing Japanese using ConTeXt
Tim 't Hart wrote: Recently, I've made the 'unwise' decision to start studying Japanese next year, and of course I want to keep on using ConTeXt to write my school papers. [] So I decided to find a way to write Japanese in ConTeXt. First I tried using the eOmega/ConTeXt combination since I have some great OTPs for it, but soon found out that Omega is still the TeX of the future, in other words, not the TeX of today and extremely unstable. Then I decided to try ConTeXt's UTF-8 support. I created the following test I asked about Japanese a while back. Hans requested more information on encodings, fonts, etc. I don't know enough about these things or ConTeXt to know what is needed exactly. From what I've read, unicode is not that popular in Japan itself. The most common encodings here are a) iso-2022-jp (7bit) b) japanese-iso-8bit (a.k.a euc-japan-1990, euc-japan, euc-jp) c) japanese-shift-jis (shift jis 8bit; common under MS Windows) Describe Language Environment under MULE in Gnu Emacs gives some info. Ken Lunde of Adobe has a book or two on processing Japanese. Typesetting Japanese could be more complicated than Chinese because of the concurrent use of four writing systems: a) Kanji (Chinese Characters) b) Hiragana (Syllabic script for representing grammatical endings and words for which Kanji are not commonly used.) c) Katakana (Syllabic script for representing foreign words, some scientfic words (flora, fauna), and for emphasis) d) Romaji -- lit. Roman Characters (Sometimes foreign languages, especially English, are represented in latin script) It is more common than you might imagine. I guess I need to track down a few sample documents. I tried to turn up some info on Japanese typesetting rules but had no luck. best wishes, Matt ___ ntg-context mailing list [EMAIL PROTECTED] http://www.ntg.nl/mailman/listinfo/ntg-context
Re: [NTG-context] Writing Japanese using ConTeXt
On Mon, Jun 09, 2003 at 11:16:27PM +0900, Matthew Huggett wrote: Recently, I've made the 'unwise' decision to start studying Japanese next year, Unwise? Only if you don't really want to do it, or if you are laboring under illusions--left over from the 80s--that it will guarantee you a lucrative and glamorous career in international trade ;-) But anyway, I am also interested in using ConTeXt for Japanese, and would be glad to contribute what I can to this effort. I asked about Japanese a while back. Hans requested more information on encodings, fonts, etc. I don't know enough about these things or ConTeXt to know what is needed exactly. I don't know much about ConTeXt internals, but do know something about these things, so I may be able to help. Was Hans' request on the mailing list? If you know when it was posted, perhaps I can look it up. Typesetting Japanese could be more complicated than Chinese because of the concurrent use of four writing systems: On Mon, Jun 09, 2003 at 06:33:49PM +0200, Tim 't Hart wrote: Unicode wasn't that popular because Unix-like operating systems used EUC as encoding, and Microsoft used their own invented Shift-JIS encoding. There were also cultural/political reasons, with perhaps a touch of Not Invented Here syndrome. But that's a different story. So there is still a lot of digital text out there written in these encodings, and a lot of tools still use it. But I think that if you want to write new texts, using Unicode shouldn't be a problem for most users. I guess that most editors supporting Asian encodings also make it possible to save in UTF-8. I think nowadays it's easier to find a Unicode enabled editor than it is to find a Shift-JIS/EUC editor! (Well, on Windows anyway...). Yes, recent Windows versions (starting with NT 4.0 in the business series, and ... not sure ... ME? in the consumer series) use some form of Unicode as their base encoding, so I think it is now the norm for Windows text editors to support UTF-8 ... I'm pretty sure TextPad does, for example. Since ConTeXt already supports UTF-8, I don't see a reason to make thinks more difficult than they already are by writing text in other encodings. On the face of it that makes sense. But I don't think it's safe to make a blanket assumption that the text in a ConTeXt document will originate with the creator of the document, or that it will be newly written. Also, UTF-8 support is still a bit half-baked on Unix/Linux systems. When I look at the source of the Chinese module, the most difficult part for me to understand is the part about font encoding, the enco-chi.tex file, and the use of \defineuclass in that file. I guess it has to do something with mapping the written text to the font. Most likely. I might be able to glean something useful from that file. I'll take a look when I can find the time. I guess that if you want to make a proper Japanese module, you'll need to support JIS or Shift-JIS encoded fonts. This would be a good idea for Type 1 font support. It seems to me that almost all recent Japanese TrueType fonts have a Unicode CMap. But on the other hand, maybe we don't need to support that since there are a lot of Japanese Unicode fonts available. I use WinXP, and there we have msmincho.ttc and msgothic.ttc, which are both Unicode fonts. Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but I have tried 2 or 3 times to include a Japanese TTC font directly in a PDFTeX document, but was never able to make it work. And Cyberbit is a Unicoded font as well. Commercially available fonts by Dynalab (Dynafont Japanese TrueType collection is quite cheap and very good) are also Unicode fonts. Again, I don't think we should make it difficult for ourselves by trying to support non-Unicode fonts while unicoded Japanese fonts are easy to use and widely available. Well, it can be done in stages. I think that any serious attempt to support Japanese in ConTeXt should encompass all common encodings. But I don't see anything wrong with starting out Unicode-only. Typesetting Japanese could be more complicated than Chinese because of the concurrent use of four writing systems The fact that Japanese uses four writing systems is not really a problem. Maybe it's not a big problem. But it is certainly more complex than chinese, since there is a mixture of proportional and fixed-width characters, and the presence of Kana and Romaji complicate the line-breaking rules. I guess I need to track down a few sample documents. I tried to turn up some info on Japanese typesetting rules but had no luck. What would a good sample consist of? I can probably find something. The only info I got is from Ken Lunde's CJKV book, where he mentions some rules about CJK line breaking. Yes, Lunde is good, but he doesn't go into enough detail to serve as an implementor's guide. I've also searched for more info on this subject; my impression is that