[ https://issues.apache.org/jira/browse/PDFBOX-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932834#comment-17932834 ]
Matti Oinas commented on PDFBOX-4728: ------------------------------------- The problem happens, when XObject has name with non UTF-8 encoding. !image-2025-03-06-07-28-20-426.png! Content references that object {code:java} ... q 538.554 0 0 239.811 28.346 587.083 cm /Løvetann Do Q ...{code} Without the fix the XObject name will be different after the save operation, but content is still referencing using the original name. > Broken PDF after load and save > ------------------------------ > > Key: PDFBOX-4728 > URL: https://issues.apache.org/jira/browse/PDFBOX-4728 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Writing > Affects Versions: 2.0.18, 3.0.0 PDFBox, 3.0.4 PDFBox > Reporter: Matti Oinas > Priority: Major > Attachments: PDFBOX-4728.patch, image-2025-03-06-07-28-20-426.png > > > If read was done using WINDOWS-1252 charset and writing is done using > UTF-8 then resulting PDF will be broken after load and save operations. > {{PDDocument document = PDDocument.load(sourcePath);}} > {{document.save(targetPath);}} > If source PDF contains XObject dictionary reference whose name isn't > encoded in UTF-8. For example. > /L#f8vetann 16 0 R > That is read using WINDOWS-1252 encoding. Now if write operation is > using UTF-8 then the resulting name will be > /L#3Fvetann 16 0 R > And resulting PDF is broken and image is missing. > FIX in pull request: https://github.com/apache/pdfbox/pull/77 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org