[ https://issues.apache.org/jira/browse/FOP-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Moser updated FOP-2963: -------------------------------- Attachment: perf_improvements.patch example.fo > Add Option for Safer Hyphenation > -------------------------------- > > Key: FOP-2963 > URL: https://issues.apache.org/jira/browse/FOP-2963 > Project: FOP > Issue Type: Improvement > Reporter: Nicholas Moser > Priority: Major > Attachments: example-after-disabled.pdf, example-after-enabled.pdf, > example-before.pdf, example.fo, patch.diff, perf_improvements.patch > > > This is a new proposed setting for FOP I have decided to call *safer > hyphenation*. > Currently, FOP may generate PDFs where text can overlap or go off the page. > The most common scenarios I've seen this occur are: > # A very small amount of space is allocated for text, such as the cell of > table. Even if there are valid hyphenation points for words, a sufficiently > large word may exit the cell as there aren't enough hyphenation points in it. > # A string of characters such as numbers will exit the space allocated for > them even if there is plenty of room to line break. This is because > hyphenation patterns do not set line breaks for strings of numbers, therefore > it sees no valid hyphenation points. > Examples of these issues can be seen in the attached PDF > *example-before.pdf*. The third row on the first table has a really long word > with many hyphenation points. Despite this, it exits the cell twice due to > there not being enough hyphenation points. Additionally, The rows below this > row contain a long series of numbers that have no hyphenation points and go > off the page. > My proposed fix for this involves a new configuration setting called *safer > hyphenation*. It effectively does three things. > # Places hyphenation points between every character in a string buffer, > ignoring hyphenation patterns. > # Moves hyphenation from the second pass to the third pass of > findOptimalBreakingPoints(...) > # Massively increases the penalty for hyphenation. > The first change is fairly simple. A hyphenation can occur anywhere in any > word in the document. This effectively fixes both of the problems, since now > they will line break before they exit their allocated space. The issue is > that now, the line breaking algorithm will attempt to use these new > hyphenation points even when not necessary. This will result in many ugly > hyphenations. Since hyphenation patterns are no longer used, I argue that the > best way to handle this is to avoid hyphenation now unless it is absolutely > necessary. > The second and third changes attempt to avoid hyphenation unless it is > absolutely necessary. The second change only allows hyphenation during the > third pass of the optimal breaking point search, after the max adjustment has > been changed to 20. The third change massively increases the penalty for > using a hyphenation. This results in the algorithm in avoiding hyphenation > unless there are no other options. > Since this is a new configuration setting, I've included two additional PDFs, > *example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF > proves that when the configuration is off, the changes are entirely passive > and cause no different. The second PDF shows the improvements of using safer > hyphenation. It also shows the downside, in that old hyphenation (with > hyphenation patterns) can no longer be used to improve the layout of a > paragraph. -- This message was sent by Atlassian Jira (v8.20.10#820010)