Manuel Mall wrote:
While investigating if we could use the standard java.text.BreakIterator
to determine line break points I noticed that FOP uses in addition to
space, zero width space, hyphen also the forward slash as a valid line
breaking character. The Java BreakIterator does not recognize slash as
a line breaking char (nor FWIW does MS Word).
What is the background to FOP allowing this? Is this consistent with
normal user expectations or is this specific to type setting
environments / Tex / Knuth?
The BreakIterator class is supposed to implement the Unicode TR14
standard annex
http://www.unicode.org/reports/tr14/
The slash U+002F aka SOLIDUS is assigned a line breaking property
value SY (Symbols Allowing Breaks)
http://www.unicode.org/Public/UNIDATA/LineBreak.txt
which means "prevent a break before, and allow a break after". I suspect
this is a recent change in Unicode, not implemented yet by your JDK
release.
BTW first breaking the text using whitespace, then applying the
BreakIterator is unwise, because white space is significant for TR14
line breaking. Unfortunately, combining whitespace normalization, line
break detection and word parsing (for hyphenation) in a single pass is
unwieldy if BreakIterator is used, that's why I tried to implement it
differently some time ago
http://people.apache.org/~pietsch/linebreak.tar.gz
J.Pietschmann