[ 
https://issues.apache.org/jira/browse/PDFBOX-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930762#comment-17930762
 ] 

Daniel Gredler commented on PDFBOX-5230:
----------------------------------------

Thanks for the pointer! I'll pursue this avenue and let you know how it goes.

I think ZWNJ, ZWSP, WJ and ZWNBSP can all be treated the same way: no 
additional handling needed in the GSUB processing, since all we need is their 
presence in the text to behave correctly, and then during font subsetting we 
force these glyphs to be invisible (zero width, no contours), even if the 
source font defines them with a non-zero width or with contours. Liberation 
Sans (already used elsewhere in the tests) has a glyph for ZWNJ that is 
zero-width but has contours, so seems like a decent test candidate.

I think it's worth scoping out ZWJ and a few other zero-width characters for 
now, because they would need additional handling in the GSUB processing or 
elsewhere (i.e. not as simple).

BTW, I created a focused clean-up PR for the GSUB code in 
`PDAbstractContentStream` while I was digging around the GSUB code: 
https://github.com/apache/pdfbox/pull/202 . More details are available at the 
link, but let me know whether you're interested in smaller cleanup PRs like 
this one.

> Zero-width non-joiner characters visible in generated PDF
> ---------------------------------------------------------
>
>                 Key: PDFBOX-5230
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5230
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, PDModel, Writing
>    Affects Versions: 2.0.16
>            Reporter: Daniel Gredler
>            Priority: Major
>         Attachments: Af.pdf, zwnj-pdfkit.pdf, zwnj.pdf, zwnj.png
>
>
> I'd like to use the [zero-width 
> non-joiner|https://en.wikipedia.org/wiki/Zero-width_non-joiner] (ZWNJ) 
> character to prevent character shaping in some cases when using Arabic and 
> Indic scripts. This works correctly using some fonts like Arial Unicode 
> (character shaping is prevented and no ZWNJ glyph is visible in the PDF), but 
> does not work correctly when using fonts like Tahoma or Google Noto Sans 
> Regular, where the ZWNJ character is visible in the PDF. The ZWNJ glyph is 
> not visible when using these fonts in other programs, like Microsoft Word.
> I suspect that the `advanceWidth` settings in the `hmtx` table should be 
> taken into account somehow but are not, because the `advanceWidth` for this 
> glyph is 0 in both of these fonts which are erroneously generating visual 
> artifacts for the ZWNJ character (Tahoma and Google Noto Sans Regular).
> Test case generating the attached PDF file:
> {code:java}
> public class ZwnjTest {
>     public static void main(String[] args) throws IOException {
>         try (PDDocument document = new PDDocument()) {
>             PDPage page = new PDPage(PDRectangle.LETTER);
>             document.addPage(page);
>             try (PDPageContentStream stream = new 
> PDPageContentStream(document, page)) {
>                 // Tahoma: ZWNJ glyph is a vertical bar, but advanceWidth in 
> hmtx table is 0 -> shown in PDF anyway (unexpected)
>                 PDFont tahoma = PDType0Font.load(document, new 
> File("C:/Windows/Fonts/tahoma.ttf"));
>                 stream.beginText();
>                 stream.setFont(tahoma, 20);
>                 stream.newLineAtOffset(50, 650);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C1"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>                 // Arial Unicode: ZWNJ glyph contains no outline -> not shown 
> in PDF (as expected)
>                 PDFont arialu = PDType0Font.load(document, new 
> File("C:/Windows/Fonts/ARIALUNI.TTF"));
>                 stream.beginText();
>                 stream.setFont(arialu, 20);
>                 stream.newLineAtOffset(50, 600);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C2"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>                 // Google Noto Sans Regular: ZWNJ glyph is a vertical bar, 
> but advanceWidth in hmtx table is 0 -> shown in PDF anyway (unexpected)
>                 PDFont gnotos = PDType0Font.load(document, new 
> File("noto-sans-regular.ttf"));
>                 stream.beginText();
>                 stream.setFont(gnotos, 20);
>                 stream.newLineAtOffset(50, 550);
>                 stream.showText("t\u200Ce\u200Cs\u200Ct\u200C \u200C3"); // 
> U+200C = zero width non-joiner
>                 stream.endText();
>             }
>             document.save("zwnj.pdf");
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to