Daniel Gredler created PDFBOX-5230:
--------------------------------------

             Summary: Zero-width non-joiner characters visible in generated PDF
                 Key: PDFBOX-5230
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5230
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox, PDModel, Writing
    Affects Versions: 2.0.16
            Reporter: Daniel Gredler
         Attachments: zwnj.pdf

I'd like to use the [zero-width 
non-joiner|https://en.wikipedia.org/wiki/Zero-width_non-joiner] (ZWNJ) 
character to prevent character shaping in some cases when using Arabic and 
Indic scripts. This works correctly using some fonts like Arial Unicode 
(character shaping is prevented and no ZWNJ glyph is visible in the PDF), but 
does not work correctly when using fonts like Tahoma or Google Noto Sans 
Regular, where the ZWNJ character is visible in the PDF. The ZWNJ glyph is not 
visible when using these fonts in other programs, like Microsoft Word.

I suspect that the `advanceWidth` settings in the `hmtx` table should be taken 
into account somehow but are not, because the `advanceWidth` for this glyph is 
0 in both of these fonts which are erroneously generating visual artifacts for 
the ZWNJ character (Tahoma and Google Noto Sans Regular).

Test case generating the attached PDF file:
{code:java}
public class ZwnjTest {
    public static void main(String[] args) throws IOException {
        try (PDDocument document = new PDDocument()) {

            PDPage page = new PDPage(PDRectangle.LETTER);
            document.addPage(page);

            try (PDPageContentStream stream = new PDPageContentStream(document, 
page)) {

                // Tahoma: ZWNJ glyph is a vertical bar, but advanceWidth in 
hmtx table is 0 -> shown in PDF anyway (unexpected)
                PDFont tahoma = PDType0Font.load(document, new 
File("C:/Windows/Fonts/tahoma.ttf"));
                stream.beginText();
                stream.setFont(tahoma, 20);
                stream.newLineAtOffset(50, 650);
                stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C1"); // 
U+200C = zero width non-joiner
                stream.endText();

                // Arial Unicode: ZWNJ glyph contains no outline -> not shown 
in PDF (as expected)
                PDFont arialu = PDType0Font.load(document, new 
File("C:/Windows/Fonts/ARIALUNI.TTF"));
                stream.beginText();
                stream.setFont(arialu, 20);
                stream.newLineAtOffset(50, 600);
                stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C2"); // 
U+200C = zero width non-joiner
                stream.endText();

                // Google Noto Sans Regular: ZWNJ glyph is a vertical bar, but 
advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
                PDFont gnotos = PDType0Font.load(document, new 
File("noto-sans-regular.ttf"));
                stream.beginText();
                stream.setFont(gnotos, 20);
                stream.newLineAtOffset(50, 550);
                stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C3"); // 
U+200C = zero width non-joiner
                stream.endText();
            }

            document.save("zwnj.pdf");
        }
    }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to