[jira] [Commented] (FOP-2963) Add Option for Safer Hyphenation

Nicholas Moser (Jira) Fri, 22 Jul 2022 08:46:05 -0700


    [ 
https://issues.apache.org/jira/browse/FOP-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570093#comment-17570093
 ]


Nicholas Moser commented on FOP-2963:
-------------------------------------

Just a heads up for anyone interested in using this patch, it does result in 
more memory usage since many more Knuth nodes are being created. To help 
alleviate memory I created an additional patch that helps reduce the number of 
object allocations. Specifically, I found that there are many calls to 
log.debug(...) that include String concatenation in them, resulting in creating 
a StringBuilder object. This can be alleviated by first checking if debug 
logging is enabled. For example:
{code:java}
-                log.debug("PLM> break - " + 
getBreakClassName(breakPenalty.getBreakClass()));
+                if (log.isDebugEnabled()) {
+                    log.debug("PLM> break - " + 
getBreakClassName(breakPenalty.getBreakClass()));
+                } {code}
I've attached a patch fixing this to this JIRA: [^perf_improvements.patch]

These debug logs are also a problem in the mainline branch of FOP, but they are 
even more of a problem after taking the patch from this Jira since there are 
more Knuth nodes and many of these log.debug(...) calls occur in a hot loop 
over the Knuth nodes.

I've also attached a .fo file I used to create the original PDFs on this JIRA: 
[^example.fo]

> Add Option for Safer Hyphenation
> --------------------------------
>
>                 Key: FOP-2963
>                 URL: https://issues.apache.org/jira/browse/FOP-2963
>             Project: FOP
>          Issue Type: Improvement
>            Reporter: Nicholas Moser
>            Priority: Major
>         Attachments: example-after-disabled.pdf, example-after-enabled.pdf, 
> example-before.pdf, example.fo, patch.diff, perf_improvements.patch
>
>
> This is a new proposed setting for FOP I have decided to call *safer 
> hyphenation*.
> Currently, FOP may generate PDFs where text can overlap or go off the page. 
> The most common scenarios I've seen this occur are:
>  # A very small amount of space is allocated for text, such as the cell of 
> table. Even if there are valid hyphenation points for words, a sufficiently 
> large word may exit the cell as there aren't enough hyphenation points in it.
>  # A string of characters such as numbers will exit the space allocated for 
> them even if there is plenty of room to line break. This is because 
> hyphenation patterns do not set line breaks for strings of numbers, therefore 
> it sees no valid hyphenation points.
> Examples of these issues can be seen in the attached PDF 
> *example-before.pdf*. The third row on the first table has a really long word 
> with many hyphenation points. Despite this, it exits the cell twice due to 
> there not being enough hyphenation points. Additionally, The rows below this 
> row contain a long series of numbers that have no hyphenation points and go 
> off the page.
> My proposed fix for this involves a new configuration setting called *safer 
> hyphenation*. It effectively does three things.
>  # Places hyphenation points between every character in a string buffer, 
> ignoring hyphenation patterns.
>  # Moves hyphenation from the second pass to the third pass of 
> findOptimalBreakingPoints(...)
>  # Massively increases the penalty for hyphenation.
> The first change is fairly simple. A hyphenation can occur anywhere in any 
> word in the document. This effectively fixes both of the problems, since now 
> they will line break before they exit their allocated space. The issue is 
> that now, the line breaking algorithm will attempt to use these new 
> hyphenation points even when not necessary. This will result in many ugly 
> hyphenations. Since hyphenation patterns are no longer used, I argue that the 
> best way to handle this is to avoid hyphenation now unless it is absolutely 
> necessary.
> The second and third changes attempt to avoid hyphenation unless it is 
> absolutely necessary. The second change only allows hyphenation during the 
> third pass of the optimal breaking point search, after the max adjustment has 
> been changed to 20. The third change massively increases the penalty for 
> using a hyphenation. This results in the algorithm in avoiding hyphenation 
> unless there are no other options.
> Since this is a new configuration setting, I've included two additional PDFs, 
> *example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF 
> proves that when the configuration is off, the changes are entirely passive 
> and cause no different. The second PDF shows the improvements of using safer 
> hyphenation. It also shows the downside, in that old hyphenation (with 
> hyphenation patterns) can no longer be used to improve the layout of a 
> paragraph.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FOP-2963) Add Option for Safer Hyphenation

Reply via email to