Re: Korean words breaking in the middle

Manuel Mall Tue, 05 Jun 2007 16:01:09 -0700

On Tuesday 05 June 2007 21:14, Manuel Mall wrote:
> On Tuesday 05 June 2007 20:39, Brad Smith wrote:
> > I am having an issue with Korean words not breaking properly. I'm
> > using a fop build based on the svn trunk as of 2007-05-01. The
> > attached screenshot shows an example, courtesy of our Korean
> > translator. The blue rectangle on the left shows the text
> > as-rendered, with each line ending in a mid-word wrap. The
> > rectangle on the left shows the complete words as they should be
> > rendered (unlike Japanese and Chinese, Korean uses space-delimited
> > words like western
> > languages). I've checked the source and there are no spaces within
> > these words.
> >
> > Any ideas? Is a fix in the works for this?
>
> Brad,
>
> fop uses a pair based line breaking algorithm based on Unicode
> Standard Annex #14 (http://www.unicode.org/reports/tr14/). This
> document actually makes quite a few comments on special cases for the
> Korean language. Most of that is straight above my head as I have no
> idea what for example a 'Hangul syllable and conjoining jamo'
> actually is. It would be really helpful if someone with real
> knowledge of the Korean language could read and interpret for us
> those parts of the standard and may be point out where exactly we may
> be doing something wrong. It sounds to me like Korean has two
> different ways of linebreaking and possibly our default UAX#14
> implementation may only support one.
>
> I know this doesn't help you directly but the first step here for me
> would be getting help in understanding the issue in the context of
> the Unicode standard annex we try to implement.
>
> Thanks
>
> Manuel
>


The following snippet is taken from the Unicode Standard Annex #14:

= START =

H2: Hangul LV Syllable (B/A)

This class includes all characters of Hangul Syllable Type LV.

Together with conjoining jamos, Hangul syllables form Korean Syllable 
Blocks, which are kept together; see [Boundaries]. Korean uses 
space-based line breaking in many styles of documents. To support 
these, Hangul syllables and conjoining jamo need to be tailored to use 
class AL, while the default in this specification is class ID, which 
supports the case of Korean documents not using space-based line 
breaking. See Section 8.1, Types of Tailoring. See also JL, JT, JV, and 
H3.

= END =

This seems to indicate that the default is 'ideographic' linebreaking 
for Korean and not space based linebreaking and FOP only implements the 
default and does not support tailoring.

You could build yourself a custom fop version with a linebreaking table 
which treats character class H2 like AL as suggested in the snippet 
from the Unicode Annex above. I can give you pointers how to do that.

Of course the better solution would be to allow some tailoring of the 
linebreaking code but I don't have the time to look into that atm. If 
you would like to have a go feel most welcome to do so. Again, I would 
be happy to give pointers.

Regards

Manuel

> > --Brad
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Korean words breaking in the middle

Reply via email to