Dan Caprioara created FOP-2880:
----------------------------------
Summary: [PATCH] Hyphenated words are not searchable in readers
Key: FOP-2880
URL: https://issues.apache.org/jira/browse/FOP-2880
Project: FOP
Issue Type: Bug
Components: unqualified
Affects Versions: 2.3
Reporter: Dan Caprioara
The hyphenated words are rendered by FOP using the hard hyphen character. This
contradicts the PDF specification, where in section:
14.8.2.2.3 Incidental Artifacts
clearly states that the SHY soft hyphen U+00AD character should be used.
The effect is that the hyphenated words are not searchable, and the copy/paste
feature includes also the hard hyphens, instead of removing them and joining
the words pieces together.
Here is a small patch that can be applied on the FOP core project in order to
fix this:
{code}
Index: src/main/java/org/apache/fop/fo/FOPropertyMapping.java
===================================================================
--- src/main/java/org/apache/fop/fo/FOPropertyMapping.java (revision
190759)
+++ src/main/java/org/apache/fop/fo/FOPropertyMapping.java (working copy)
@@ -1106,7 +1106,10 @@
// hyphenation-character
m = new CharacterProperty.Maker(PR_HYPHENATION_CHARACTER);
m.setInherited(true);
- m.setDefault("-");
+
+ // m.setDefault("-");
+ m.setDefault("\u00ad");
+
addPropertyMaker("hyphenation-character", m);
// hyphenation-push-character-count
Index: src/main/java/org/apache/fop/render/pdf/PDFPainter.java
===================================================================
--- src/main/java/org/apache/fop/render/pdf/PDFPainter.java (revision
190759)
+++ src/main/java/org/apache/fop/render/pdf/PDFPainter.java (working copy)
@@ -420,7 +420,8 @@
PDFStructElem structElem = (PDFStructElem)
getContext().getStructureTreeElement();
languageAvailabilityChecker.checkLanguageAvailability(text);
MarkedContentInfo mci =
logicalStructureHandler.addTextContentItem(structElem);
- String actualText = getContext().isHyphenated() ?
text.substring(0, text.length() - 1) : null;
+// String actualText = getContext().isHyphenated() ?
text.substring(0, text.length() - 1) : null;
+ String actualText = null;
generator.endTextObject();
generator.updateColor(state.getTextColor(), true, null);
generator.beginTextObject(mci.tag, mci.mcid, actualText);
@@ -490,6 +491,14 @@
float glyphAdjust = 0;
if (font.hasCodePoint(orgChar)) {
ch = font.mapCodePoint(orgChar);
+ if (orgChar ==
'\u00ad'){
+ // Map it
back to the SHY, the hard hyphen is not correct, causes the hyphenated words
not being searchable.
+ // See the
PDF Spec: 14.8.2.2.3 Incidental Artifacts / Hyphenation paragraph.
+
+ // The ansi
encoding CodePointMapping has the hyphenation char with two entries,
+ // the first
is selected, the hard hyphen. Reverting...
+ ch = orgChar;
+ }
ch = selectAndMapSingleByteFont(tf, fontName, fontSize,
textutil, ch);
if ((wordSpacing != 0) &&
CharUtilities.isAdjustableSpace(orgChar)) {
glyphAdjust += wordSpacing;
{code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)