[ https://issues.apache.org/jira/browse/PDFBOX-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929225#comment-17929225 ]
Tilman Hausherr commented on PDFBOX-5961: ----------------------------------------- No improvements (not surprising, when doing debug output it was only the № that had more than 2 bytes). Also no changes in my own text extraction test files. We'll see in the "big" regression tests if there's anything new. I'll commit for the trunk during the weekend and then wait a bit before committing for the other versions. > IllegalArgumentException: Not a valid Unicode code point: 0xE28496 > ------------------------------------------------------------------ > > Key: PDFBOX-5961 > URL: https://issues.apache.org/jira/browse/PDFBOX-5961 > Project: PDFBox > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Tilman Hausherr > Priority: Major > Attachments: PDFJS-19527.pdf > > > {noformat} > IllegalArgumentException: Not a valid Unicode code point: 0xE28496 > java.base/java.lang.String.valueOfCodePoint(String.java:3345) > java.base/java.lang.Character.toString(Character.java:8053) > org.apache.pdfbox.pdmodel.font.PDType0Font.toUnicode(PDType0Font.java:548) > org.apache.pdfbox.pdmodel.font.PDFont.toUnicode(PDFont.java:450) > > org.apache.pdfbox.text.LegacyPDFStreamEngine.showGlyph(LegacyPDFStreamEngine.java:279) > > org.apache.pdfbox.debugger.pagepane.DebugTextOverlay$DebugTextStripper.showGlyph(DebugTextOverlay.java:209) > > org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:792) > > org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:651) > {noformat} > The problems are somehow related to the /ToUnicode stream at > {{Root/Pages/Kids/[0]/Resources/Font/F3/ToUnicode}}. This is a different bug > than PDFBOX-5960 and not the problem that is in PDF.js 19527. I played around > a bit supporting 3 byte codes (memo for me: version before 21.2 12:20) but > it's still the same exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org