[
https://issues.apache.org/jira/browse/PDFBOX-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501192#comment-17501192
]
Andreas Lehmkühler edited comment on PDFBOX-5384 at 3/20/22, 12:29 PM:
-----------------------------------------------------------------------
In theory the toUnicode mapping isn't involved at all when it comes to
rendering. It is used for text extraction only.
The given pdf is broken in a couple of ways
* it uses CID type 2 fonts but doesn't embed them
* it claims to use an identity mapping but doesn't
* the toUnicode CMap is named as identity map but it isn't
* the toUnicode CMap doesn't contain any toUnicode mappings
* the toUnicode CMap contains CID-mappings only
* UPDATE: the CMap is malformed as the cidrange isn't segmented in blocks of
100 lines
There are many workarounds in place to handle malformed pdfs and in the given
case the toUnicode mapping is involved. Obviously the added one from
PDFBOX-4322 broke the rendering of this one. Maybe there is a way to support
both
was (Author: lehmi):
In theory the toUnicode mapping isn't involved at all when it comes to
rendering. It is used for text extraction only.
The given pdf is broken in a couple of ways
* it uses CID type 2 fonts but doesn't embed them
* it claims to use an identity mapping but doesn't
* the toUnicode CMap is named as identity map but it isn't
* the toUnicode CMap doesn't contain any toUnicode mappings
* the toUnicode CMap contains CID-mappings only
There are many workarounds in place to handle malformed pdfs and in the given
case the toUnicode mapping is involved. Obviously the added one from
PDFBOX-4322 broke the rendering of this one. Maybe there is a way to support
both
> Wrong glyphs used
> -----------------
>
> Key: PDFBOX-5384
> URL: https://issues.apache.org/jira/browse/PDFBOX-5384
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.25
> Reporter: Oliver Schmidtmer
> Priority: Major
> Labels: regression
> Attachments: DOR-EC E-N20_118345.pdf,
> image-2022-03-02-23-41-15-844.png
>
>
> The attached PDF uses Tahoma fonts.
> It seems the correct font is used, but it uses the wrong glyphs.
> For example the "6" from the screenshot is definitely from Tahoma Glyph 25 /
> CID 54, where it should be "S" Glyph 54 / CID 83.
> The "=" in screenshot is Glyph 32 CID 61 where "Z" Glyph 61 CID 90 should be
> used.
> !image-2022-03-02-23-41-15-844.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]