Nicholas Moser created FOP-2963:
-----------------------------------
Summary: Add Option for Safer Hyphenation
Key: FOP-2963
URL: https://issues.apache.org/jira/browse/FOP-2963
Project: FOP
Issue Type: Improvement
Reporter: Nicholas Moser
Attachments: example-after-disabled.pdf, example-after-enabled.pdf,
example-before.pdf, patch.diff
This is a new proposed setting for FOP I have decided to call *safer
hyphenation*.
Currently, FOP may generate PDFs where text can overlap or go off the page. The
most common scenarios I've seen this occur are:
# A very small amount of space is allocated for text, such as the cell of
table. Even if there are valid hyphenation points for words, a sufficiently
large word may exit the cell as there aren't enough hyphenation points in it.
# A string of characters such as numbers will exit the space allocated for
them even if there is plenty of room to line break. This is because hyphenation
patterns do not set line breaks for strings of numbers, therefore it sees no
valid hyphenation points.
Examples of these issues can be seen in the attached PDF *example-before.pdf*.
The third row on the first table has a really long word with many hyphenation
points. Despite this, it exits the cell twice due to there not being enough
hyphenation points. Additionally, The rows below this row contain a long series
of numbers that have no hyphenation points and go off the page.
My proposed fix for this involves a new configuration setting called *safer
hyphenation*. It effectively does three things.
# Places hyphenation points between every character in a string buffer,
ignoring hyphenation patterns.
# Moves hyphenation from the second pass to the third pass of
findOptimalBreakingPoints(...)
# Massively increases the penalty for hyphenation.
The first change is fairly simple. A hyphenation can occur anywhere in any word
in the document. This effectively fixes both of the problems, since now they
will line break before they exit their allocated space. The issue is that now,
the line breaking algorithm will attempt to use these new hyphenation points
even when not necessary. This will result in many ugly hyphenations. Since
hyphenation patterns are no longer used, I argue that the best way to handle
this is to avoid hyphenation now unless it is absolutely necessary.
The second and third changes attempt to avoid hyphenation unless it is
absolutely necessary. The second change only allows hyphenation during the
third pass of the optimal breaking point search, after the max adjustment has
been changed to 20. The third change massively increases the penalty for using
a hyphenation. This results in the algorithm in avoiding hyphenation unless
there are no other options.
Since this is a new configuration setting, I've included two additional PDFs,
*example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF
proves that when the configuration is off, the changes are entirely passive and
cause no different. The second PDF shows the improvements of using safer
hyphenation. It also shows the downside, in that old hyphenation (with
hyphenation patterns) can no longer be used to improve the layout of a
paragraph.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)