[ 
https://issues.apache.org/jira/browse/FOP-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Moser updated FOP-2963:
--------------------------------
    Attachment: perf_improvements.patch
                example.fo

> Add Option for Safer Hyphenation
> --------------------------------
>
>                 Key: FOP-2963
>                 URL: https://issues.apache.org/jira/browse/FOP-2963
>             Project: FOP
>          Issue Type: Improvement
>            Reporter: Nicholas Moser
>            Priority: Major
>         Attachments: example-after-disabled.pdf, example-after-enabled.pdf, 
> example-before.pdf, example.fo, patch.diff, perf_improvements.patch
>
>
> This is a new proposed setting for FOP I have decided to call *safer 
> hyphenation*.
> Currently, FOP may generate PDFs where text can overlap or go off the page. 
> The most common scenarios I've seen this occur are:
>  # A very small amount of space is allocated for text, such as the cell of 
> table. Even if there are valid hyphenation points for words, a sufficiently 
> large word may exit the cell as there aren't enough hyphenation points in it.
>  # A string of characters such as numbers will exit the space allocated for 
> them even if there is plenty of room to line break. This is because 
> hyphenation patterns do not set line breaks for strings of numbers, therefore 
> it sees no valid hyphenation points.
> Examples of these issues can be seen in the attached PDF 
> *example-before.pdf*. The third row on the first table has a really long word 
> with many hyphenation points. Despite this, it exits the cell twice due to 
> there not being enough hyphenation points. Additionally, The rows below this 
> row contain a long series of numbers that have no hyphenation points and go 
> off the page.
> My proposed fix for this involves a new configuration setting called *safer 
> hyphenation*. It effectively does three things.
>  # Places hyphenation points between every character in a string buffer, 
> ignoring hyphenation patterns.
>  # Moves hyphenation from the second pass to the third pass of 
> findOptimalBreakingPoints(...)
>  # Massively increases the penalty for hyphenation.
> The first change is fairly simple. A hyphenation can occur anywhere in any 
> word in the document. This effectively fixes both of the problems, since now 
> they will line break before they exit their allocated space. The issue is 
> that now, the line breaking algorithm will attempt to use these new 
> hyphenation points even when not necessary. This will result in many ugly 
> hyphenations. Since hyphenation patterns are no longer used, I argue that the 
> best way to handle this is to avoid hyphenation now unless it is absolutely 
> necessary.
> The second and third changes attempt to avoid hyphenation unless it is 
> absolutely necessary. The second change only allows hyphenation during the 
> third pass of the optimal breaking point search, after the max adjustment has 
> been changed to 20. The third change massively increases the penalty for 
> using a hyphenation. This results in the algorithm in avoiding hyphenation 
> unless there are no other options.
> Since this is a new configuration setting, I've included two additional PDFs, 
> *example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF 
> proves that when the configuration is off, the changes are entirely passive 
> and cause no different. The second PDF shows the improvements of using safer 
> hyphenation. It also shows the downside, in that old hyphenation (with 
> hyphenation patterns) can no longer be used to improve the layout of a 
> paragraph.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to