[
https://issues.apache.org/jira/browse/FOP-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570093#comment-17570093
]
Nicholas Moser commented on FOP-2963:
-------------------------------------
Just a heads up for anyone interested in using this patch, it does result in
more memory usage since many more Knuth nodes are being created. To help
alleviate memory I created an additional patch that helps reduce the number of
object allocations. Specifically, I found that there are many calls to
log.debug(...) that include String concatenation in them, resulting in creating
a StringBuilder object. This can be alleviated by first checking if debug
logging is enabled. For example:
{code:java}
- log.debug("PLM> break - " +
getBreakClassName(breakPenalty.getBreakClass()));
+ if (log.isDebugEnabled()) {
+ log.debug("PLM> break - " +
getBreakClassName(breakPenalty.getBreakClass()));
+ } {code}
I've attached a patch fixing this to this JIRA: [^perf_improvements.patch]
These debug logs are also a problem in the mainline branch of FOP, but they are
even more of a problem after taking the patch from this Jira since there are
more Knuth nodes and many of these log.debug(...) calls occur in a hot loop
over the Knuth nodes.
I've also attached a .fo file I used to create the original PDFs on this JIRA:
[^example.fo]
> Add Option for Safer Hyphenation
> --------------------------------
>
> Key: FOP-2963
> URL: https://issues.apache.org/jira/browse/FOP-2963
> Project: FOP
> Issue Type: Improvement
> Reporter: Nicholas Moser
> Priority: Major
> Attachments: example-after-disabled.pdf, example-after-enabled.pdf,
> example-before.pdf, example.fo, patch.diff, perf_improvements.patch
>
>
> This is a new proposed setting for FOP I have decided to call *safer
> hyphenation*.
> Currently, FOP may generate PDFs where text can overlap or go off the page.
> The most common scenarios I've seen this occur are:
> # A very small amount of space is allocated for text, such as the cell of
> table. Even if there are valid hyphenation points for words, a sufficiently
> large word may exit the cell as there aren't enough hyphenation points in it.
> # A string of characters such as numbers will exit the space allocated for
> them even if there is plenty of room to line break. This is because
> hyphenation patterns do not set line breaks for strings of numbers, therefore
> it sees no valid hyphenation points.
> Examples of these issues can be seen in the attached PDF
> *example-before.pdf*. The third row on the first table has a really long word
> with many hyphenation points. Despite this, it exits the cell twice due to
> there not being enough hyphenation points. Additionally, The rows below this
> row contain a long series of numbers that have no hyphenation points and go
> off the page.
> My proposed fix for this involves a new configuration setting called *safer
> hyphenation*. It effectively does three things.
> # Places hyphenation points between every character in a string buffer,
> ignoring hyphenation patterns.
> # Moves hyphenation from the second pass to the third pass of
> findOptimalBreakingPoints(...)
> # Massively increases the penalty for hyphenation.
> The first change is fairly simple. A hyphenation can occur anywhere in any
> word in the document. This effectively fixes both of the problems, since now
> they will line break before they exit their allocated space. The issue is
> that now, the line breaking algorithm will attempt to use these new
> hyphenation points even when not necessary. This will result in many ugly
> hyphenations. Since hyphenation patterns are no longer used, I argue that the
> best way to handle this is to avoid hyphenation now unless it is absolutely
> necessary.
> The second and third changes attempt to avoid hyphenation unless it is
> absolutely necessary. The second change only allows hyphenation during the
> third pass of the optimal breaking point search, after the max adjustment has
> been changed to 20. The third change massively increases the penalty for
> using a hyphenation. This results in the algorithm in avoiding hyphenation
> unless there are no other options.
> Since this is a new configuration setting, I've included two additional PDFs,
> *example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF
> proves that when the configuration is off, the changes are entirely passive
> and cause no different. The second PDF shows the improvements of using safer
> hyphenation. It also shows the downside, in that old hyphenation (with
> hyphenation patterns) can no longer be used to improve the layout of a
> paragraph.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)