[ https://issues.apache.org/jira/browse/PDFBOX-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933070#comment-17933070 ]
Michael Klink edited comment on PDFBOX-4728 at 3/6/25 5:40 PM: --------------------------------------------------------------- Names in PDFBox indeed have quite a history. At one time, even only US_ASCII bytes were handled properly, see e.g. [here on stack overflow|https://stackoverflow.com/a/48306517/1729265]. The main issue is, as you said, that PDFBox insists on interpreting names as text and even stores only that text in the name object. I have to admit I may be partially guilty for the current way PDFBox interprets names, considering my comments to PDFBOX-3347 and elsewhere. While it always was clear that names must not be interpreted as text, I played along in the "How should names be interpreted as text if they must be" line of discussion nonetheless. was (Author: mkl): Names in PDFBox indeed have quite a history. At one time, even only US_ASCII bytes were handled properly, see e.g. [here on stack overflow|[https://stackoverflow.com/a/48306517/1729265].] The main issue is, as you said, that PDFBox insists on interpreting names as text and even stores only that text in the name object. I have to admit I may be partially guilty for the current way PDFBox interprets names, considering my comments to PDFBOX-3347 and elsewhere. While it always was clear that names must not be interpreted as text, I played along in the "How should names be interpreted as text if they must be" line of discussion nonetheless. > Broken PDF after load and save > ------------------------------ > > Key: PDFBOX-4728 > URL: https://issues.apache.org/jira/browse/PDFBOX-4728 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Writing > Affects Versions: 2.0.18, 3.0.0 PDFBox, 3.0.4 PDFBox > Reporter: Matti Oinas > Priority: Major > Attachments: PDFBOX-4728.patch, image-2025-03-06-07-28-20-426.png > > > If read was done using WINDOWS-1252 charset and writing is done using > UTF-8 then resulting PDF will be broken after load and save operations. > {{PDDocument document = PDDocument.load(sourcePath);}} > {{document.save(targetPath);}} > If source PDF contains XObject dictionary reference whose name isn't > encoded in UTF-8. For example. > /L#f8vetann 16 0 R > That is read using WINDOWS-1252 encoding. Now if write operation is > using UTF-8 then the resulting name will be > /L#3Fvetann 16 0 R > And resulting PDF is broken and image is missing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org