[ 
https://issues.apache.org/jira/browse/PDFBOX-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375103#comment-17375103
 ] 

Daniel Gredler commented on PDFBOX-5230:
----------------------------------------

Yep, that seems to be the behavior for PDFKit.

I also tried the same using standard Java2D, and the ZWNJ chars are also not 
displayed – code below, output attached (zwnj.png).
{code:java}
public class ZwnjG2dTest {
    public static void main(String[] args) throws Exception {

        BufferedImage img = new BufferedImage(500, 300, 
BufferedImage.TYPE_INT_ARGB);
        Graphics2D g2d = img.createGraphics();
        g2d.setRenderingHint(RenderingHints.KEY_FRACTIONALMETRICS, 
RenderingHints.VALUE_FRACTIONALMETRICS_ON);
        g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING, 
RenderingHints.VALUE_ANTIALIAS_ON);
        g2d.setColor(Color.WHITE);
        g2d.fillRect(0, 0, img.getWidth(), img.getHeight());
        g2d.setColor(Color.BLACK);

        Font tahoma = Font.createFont(Font.TRUETYPE_FONT, new 
File("C:/Windows/Fonts/tahoma.ttf")).deriveFont(50f);
        g2d.setFont(tahoma);
        g2d.drawString("t\u200Ce\u200Cs\u200Ct\u200C \u200C1", 50, 50); // 
U+200C = zero width non-joiner

        Font arial = Font.createFont(Font.TRUETYPE_FONT, new 
File("C:/Windows/Fonts/ARIALUNI.TTF")).deriveFont(50f);
        g2d.setFont(arial);
        g2d.drawString("t\u200Ce\u200Cs\u200Ct\u200C \u200C2", 50, 100); // 
U+200C = zero width non-joiner

        Font noto = Font.createFont(Font.TRUETYPE_FONT, new 
File("noto-sans-regular.ttf")).deriveFont(50f);
        g2d.setFont(noto);
        g2d.drawString("t\u200Ce\u200Cs\u200Ct\u200C \u200C3", 50, 150); // 
U+200C = zero width non-joiner

        ImageIO.write(img, "png", new File("zwnj.png"));
    }
}
{code}

> Zero-width non-joiner characters visible in generated PDF
> ---------------------------------------------------------
>
>                 Key: PDFBOX-5230
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5230
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, PDModel, Writing
>    Affects Versions: 2.0.16
>            Reporter: Daniel Gredler
>            Priority: Major
>         Attachments: Af.pdf, zwnj-pdfkit.pdf, zwnj.pdf, zwnj.png
>
>
> I'd like to use the [zero-width 
> non-joiner|https://en.wikipedia.org/wiki/Zero-width_non-joiner] (ZWNJ) 
> character to prevent character shaping in some cases when using Arabic and 
> Indic scripts. This works correctly using some fonts like Arial Unicode 
> (character shaping is prevented and no ZWNJ glyph is visible in the PDF), but 
> does not work correctly when using fonts like Tahoma or Google Noto Sans 
> Regular, where the ZWNJ character is visible in the PDF. The ZWNJ glyph is 
> not visible when using these fonts in other programs, like Microsoft Word.
> I suspect that the `advanceWidth` settings in the `hmtx` table should be 
> taken into account somehow but are not, because the `advanceWidth` for this 
> glyph is 0 in both of these fonts which are erroneously generating visual 
> artifacts for the ZWNJ character (Tahoma and Google Noto Sans Regular).
> Test case generating the attached PDF file:
> {code:java}
> public class ZwnjTest {
>     public static void main(String[] args) throws IOException {
>         try (PDDocument document = new PDDocument()) {
>             PDPage page = new PDPage(PDRectangle.LETTER);
>             document.addPage(page);
>             try (PDPageContentStream stream = new 
> PDPageContentStream(document, page)) {
>                 // Tahoma: ZWNJ glyph is a vertical bar, but advanceWidth in 
> hmtx table is 0 -> shown in PDF anyway (unexpected)
>                 PDFont tahoma = PDType0Font.load(document, new 
> File("C:/Windows/Fonts/tahoma.ttf"));
>                 stream.beginText();
>                 stream.setFont(tahoma, 20);
>                 stream.newLineAtOffset(50, 650);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C1"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>                 // Arial Unicode: ZWNJ glyph contains no outline -> not shown 
> in PDF (as expected)
>                 PDFont arialu = PDType0Font.load(document, new 
> File("C:/Windows/Fonts/ARIALUNI.TTF"));
>                 stream.beginText();
>                 stream.setFont(arialu, 20);
>                 stream.newLineAtOffset(50, 600);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C2"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>                 // Google Noto Sans Regular: ZWNJ glyph is a vertical bar, 
> but advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
>                 PDFont gnotos = PDType0Font.load(document, new 
> File("noto-sans-regular.ttf"));
>                 stream.beginText();
>                 stream.setFont(gnotos, 20);
>                 stream.newLineAtOffset(50, 550);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C3"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>             }
>             document.save("zwnj.pdf");
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to