https://issues.apache.org/bugzilla/show_bug.cgi?id=54081
Priority: P2
Bug ID: 54081
Assignee: [email protected]
Summary: Properly tag hyphenated words
Severity: normal
Classification: Unclassified
OS: All
Reporter: [email protected]
Hardware: All
Status: NEW
Version: all
Component: pdf
Product: Fop
If a hyphenated word is stored as-is in the PDF output, a screen reader will
read it differently to when it is not hyphenated. This can result into
incomprehensible text.
To fix that problem, a hyphenated word should properly be tagged as such. This
can be done in 2 ways.
The first possibility is to add an 'ActualText' entry to the property list of
the corresponding marked-content sequence. Its value would basically be the
whole text minus the last hyphen character.
The second possibility is to replace the last hyphen with a soft hyphen
character, which will be recognized by screen readers such that the split word
will be read as one. This will work only if the font has a glyph for the soft
hyphen character.
The latter possibility is the recommended way to handle hyphenated words. The
former can be implemented as a fallback for when there is no available glyph
for the soft hyphen, or when the hyphenation character is not actually a hyphen
(this can be customized through the hyphenation-character property).
--
You are receiving this mail because:
You are the assignee for the bug.