srintuar26 wrote:

Chinese and Japanese (not Korean) don't use
whitespace between "words".



Ooh, that makes me curious: is there a good discussion of how to
line-break Japanese text? I wonder how browsers are doing it...


As far as line breaking is concerned, it's not hard to do it right for Japanese
text. All browsers need to do is NOT to break where line breaking is
prohibited as specified in JIS X 14xxx(?)[1] and to break on other places
(syllable boundaris, character boundaries[2]) to make text as justified (on both sides)
as possible. The same is true of Korean and Chinese. It doesn't make any
difference whether space is used or not in Japanese/Korean/Chinese.
Mozilla (and I guess MS IE as well) supports JIS X 14xxx for Japanese,
Korean and Chinese.[3] A harder than this is That text and that's
where you need to pay more attention. Thai line breaking rule is also
supported by Mozilla.


As I wrote earlier, programs like 'fmt' should support this.

Netscape 3.x broke lines ONLY at spaces so that some Korean web page
authors used a simple perl script to insert <wbr> tag everywhere(every
syllable boundary) linebreaking is allowed.


[1] The prohibition rule is not a rocket science. You can easily guess
it. Here are some examples:

 - lines cannot be broken after an opening quoation mark, single
    or double. That is, a line cannot end with them.

 - lines cannot be broken before a comma, a period, a question mark, an
   exclamation mark That is, a line cannot begin with them.

- There are some Kana-specific rules I don't remember at the moment.


[2] To generalize, I'd use 'grapheme boundaries'. See Unicode TR #29 for details.

[3] See also Unicode TR #14.
When you read UTR #14,  be aware that its treatment of
Korean linebreaking is not satisfactory. Simply put,  Korean text
can be broken at any *grapheme boundaries* (when NFC is used
for modern text, it means at any Unicode codepoint boundaries
for modern syllables) as well as at space except for  about
a dozen places where line breaking is prohibited. (see JIS X 14xxx
aforementioned). 99% of Korean text in print use layout
justified on both sides, formal or informall but TR #14 gives
a *wrong* impression that about half of Korean text use linebreaking
only on space and ragged justification style.  The author of TR #14
wouldn't listen to my feedback insisting that he's got plenty of
printed materials contradicting what I had told him which he
appreciated at the end of TR #14.


-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/



Reply via email to