Daniel Gredler created PDFBOX-5230:
--------------------------------------
Summary: Zero-width non-joiner characters visible in generated PDF
Key: PDFBOX-5230
URL: https://issues.apache.org/jira/browse/PDFBOX-5230
Project: PDFBox
Issue Type: Bug
Components: FontBox, PDModel, Writing
Affects Versions: 2.0.16
Reporter: Daniel Gredler
Attachments: zwnj.pdf
I'd like to use the [zero-width
non-joiner|https://en.wikipedia.org/wiki/Zero-width_non-joiner] (ZWNJ)
character to prevent character shaping in some cases when using Arabic and
Indic scripts. This works correctly using some fonts like Arial Unicode
(character shaping is prevented and no ZWNJ glyph is visible in the PDF), but
does not work correctly when using fonts like Tahoma or Google Noto Sans
Regular, where the ZWNJ character is visible in the PDF. The ZWNJ glyph is not
visible when using these fonts in other programs, like Microsoft Word.
I suspect that the `advanceWidth` settings in the `hmtx` table should be taken
into account somehow but are not, because the `advanceWidth` for this glyph is
0 in both of these fonts which are erroneously generating visual artifacts for
the ZWNJ character (Tahoma and Google Noto Sans Regular).
Test case generating the attached PDF file:
{code:java}
public class ZwnjTest {
public static void main(String[] args) throws IOException {
try (PDDocument document = new PDDocument()) {
PDPage page = new PDPage(PDRectangle.LETTER);
document.addPage(page);
try (PDPageContentStream stream = new PDPageContentStream(document,
page)) {
// Tahoma: ZWNJ glyph is a vertical bar, but advanceWidth in
hmtx table is 0 -> shown in PDF anyway (unexpected)
PDFont tahoma = PDType0Font.load(document, new
File("C:/Windows/Fonts/tahoma.ttf"));
stream.beginText();
stream.setFont(tahoma, 20);
stream.newLineAtOffset(50, 650);
stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C1"); //
U+200C = zero width non-joiner
stream.endText();
// Arial Unicode: ZWNJ glyph contains no outline -> not shown
in PDF (as expected)
PDFont arialu = PDType0Font.load(document, new
File("C:/Windows/Fonts/ARIALUNI.TTF"));
stream.beginText();
stream.setFont(arialu, 20);
stream.newLineAtOffset(50, 600);
stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C2"); //
U+200C = zero width non-joiner
stream.endText();
// Google Noto Sans Regular: ZWNJ glyph is a vertical bar, but
advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
PDFont gnotos = PDType0Font.load(document, new
File("noto-sans-regular.ttf"));
stream.beginText();
stream.setFont(gnotos, 20);
stream.newLineAtOffset(50, 550);
stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C3"); //
U+200C = zero width non-joiner
stream.endText();
}
document.save("zwnj.pdf");
}
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]