Summary: New hyphenation patterns
           Product: Fop
           Version: all
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general

Created an attachment (id=24069)
 --> (
classes for hyphenation, generated from UnicodeData.txt

The TeX people are now moving to Unicode based TeX engines. Therefore they
created new hyphenation pattern files in utf-8 encoding, see
These pattern files can be directly transformed into XML format and used in
FOP. I tested a few, and had no problems.

They lack one thing, however, classes. FOP uses classes to determine what is a
letter (only words consisting of letters will be hyphenated) and the LC/UC
mapping. TeX gets the classes from its Unicode setup, see e.g.
I have tried to do the same, and I attach the result. These classes would be
valid for each hyphenation pattern file. Some localizations seem to have their
own variants of the LC/UC mapping, but I have not investigated that.

The classes were generated as follows: Roughly, each character that is its own
LC generates a class. Its UC and TC (title case character) are added to the
class. More precisely, the selection of characters generating a class was done
as follows:
1. In the first plane,
2. Category Ll or Lu or Lt and its own LC character, or category Lo,
3. Not in the following blocks: Superscripts and Subscripts, Letterlike
Symbols, Alphabetic Presentation Forms, Halfwidth and Fullwidth Forms, CJK
Unified Ideographs, CJK Unified Ideographs Extension A, Hangul Syllables.

We can do two things: Add these classes to each hyphenation file, or add them
to the code that generates the hyphenation trie, preferably to be read from a
separate file. I prefer the latter option. What do you think?

Configure bugmail:
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to