PDFBOX-5704

Cheng Li Thu, 18 Jan 2024 22:23:37 -0800

Hello team,



I recently encountered the problem that PDFBox cannot render Chinese, the
problem is very similar to https://issues.apache.org/jira/browse/PDFBOX-5704
.



In this case, the attached PDF file embedded a CCF font file, the correct
font type/subtype should be /CIDFontType0 and /CIDFontType0C and should
declare property /FontFile3. But it wrongly declared the subfont as a
truetype, and it makes PDFBox uses TTF parser to parse the font file stream
based on the declared type.



According to the spec, PDFBox does it right, but from the perspective of
use, this looks more like a “bug”, though this file would display good in
other most used PDF readers (Adobe, Foxit, pdfjs etc.)



I have many years of working experience in PDF generation (iText, PDFBox,
etc.), and I know that after a PDF is generated, as long as it can be
displayed correctly in Adobe Reader, then it is considered correct. If
another program cannot display it correctly, it will be considered a bug in
other program. It’s not fair, but it’s reality. Many low-quality PDF
generation tools/libraries are still widely used.



In pdf.js,  it will parse the font file first, and prefer the font type in
font file rather than the type declared in font dictionary.

https://github.com/mozilla/pdf.js/blob/1cdbcfef821c7f6e81ea22fe68a8b815bca01c4e/src/core/fonts.js#L1052



So my question is “Is that possible that PDFBox provide some font
processing workaround logic to handle such case?”



Thanks

Mike

About https://issues.apache.org/jira/browse/PDFBOX-5704

Reply via email to