[
https://issues.apache.org/jira/browse/PDFBOX-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929339#comment-17929339
]
Andreas Lehmkühler edited comment on PDFBOX-5961 at 2/22/25 11:17 AM:
----------------------------------------------------------------------
We don't store the origin byte values but convert them into integer values.
Doing so the origin length of the byte values gets lost. According to
PDFBOX-4749 it is important to know the length at least in some corner cases.
On the other hand I wrote in that ticket that we should consider to refactor
the (too early?!?) conversion of the byte value(s) to an integer.
was (Author: lehmi):
We don't store the origin byte values but convert them into integer values.
Doing so the origin length of the byte values gets lost. According to
PDFBOX-4749 it is important to know the length at least in some corner cases.
On the other I wrote in that ticket that we should consider to refactor the
(too early?!?) conversion of the byte value(s) to an integer.
> IllegalArgumentException: Not a valid Unicode code point: 0xE28496
> ------------------------------------------------------------------
>
> Key: PDFBOX-5961
> URL: https://issues.apache.org/jira/browse/PDFBOX-5961
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 2.0.33, 3.0.4 PDFBox, 4.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
> Attachments: PDFJS-19527.pdf
>
>
> {noformat}
> IllegalArgumentException: Not a valid Unicode code point: 0xE28496
> java.base/java.lang.String.valueOfCodePoint(String.java:3345)
> java.base/java.lang.Character.toString(Character.java:8053)
> org.apache.pdfbox.pdmodel.font.PDType0Font.toUnicode(PDType0Font.java:548)
> org.apache.pdfbox.pdmodel.font.PDFont.toUnicode(PDFont.java:450)
>
> org.apache.pdfbox.text.LegacyPDFStreamEngine.showGlyph(LegacyPDFStreamEngine.java:279)
>
> org.apache.pdfbox.debugger.pagepane.DebugTextOverlay$DebugTextStripper.showGlyph(DebugTextOverlay.java:209)
>
> org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:792)
>
> org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:651)
> {noformat}
> The problems are somehow related to the /ToUnicode stream at
> {{Root/Pages/Kids/[0]/Resources/Font/F3/ToUnicode}}. This is a different bug
> than PDFBOX-5960 and not the problem that is in PDF.js 19527. I played around
> a bit supporting 3 byte codes (memo for me: version before 21.2 12:20) but
> it's still the same exception.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]