Manuel Mall wrote:
While investigating if we could use the standard java.text.BreakIterator to determine line break points I noticed that FOP uses in addition to space, zero width space, hyphen also the forward slash as a valid line breaking character. The Java BreakIterator does not recognize slash as a line breaking char (nor FWIW does MS Word).

What is the background to FOP allowing this? Is this consistent with normal user expectations or is this specific to type setting environments / Tex / Knuth?


The BreakIterator class is supposed to implement the Unicode TR14
standard annex
 http://www.unicode.org/reports/tr14/
The slash U+002F aka SOLIDUS is assigned a line breaking property
value SY (Symbols Allowing Breaks)
 http://www.unicode.org/Public/UNIDATA/LineBreak.txt
which means "prevent a break before, and allow a break after". I suspect
this is a recent change in Unicode, not implemented yet by your JDK
release.
BTW first breaking the text using whitespace, then applying the
BreakIterator is unwise, because white space is significant for TR14
line breaking. Unfortunately, combining whitespace normalization, line
break detection and word parsing (for hyphenation) in a single pass is
unwieldy if BreakIterator is used, that's why I tried to implement it
differently some time ago
 http://people.apache.org/~pietsch/linebreak.tar.gz

J.Pietschmann

Reply via email to