[ https://issues.apache.org/jira/browse/PDFBOX-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932868#comment-17932868 ]
Andreas Lehmkühler commented on PDFBOX-4728: -------------------------------------------- I'd prefer an approach where the encoding is fixed after saving it. But it is hard to implement something without a sample pdf. Maybe we are able to create one ourself > Broken PDF after load and save > ------------------------------ > > Key: PDFBOX-4728 > URL: https://issues.apache.org/jira/browse/PDFBOX-4728 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Writing > Affects Versions: 2.0.18, 3.0.0 PDFBox, 3.0.4 PDFBox > Reporter: Matti Oinas > Priority: Major > Attachments: PDFBOX-4728.patch, image-2025-03-06-07-28-20-426.png > > > If read was done using WINDOWS-1252 charset and writing is done using > UTF-8 then resulting PDF will be broken after load and save operations. > {{PDDocument document = PDDocument.load(sourcePath);}} > {{document.save(targetPath);}} > If source PDF contains XObject dictionary reference whose name isn't > encoded in UTF-8. For example. > /L#f8vetann 16 0 R > That is read using WINDOWS-1252 encoding. Now if write operation is > using UTF-8 then the resulting name will be > /L#3Fvetann 16 0 R > And resulting PDF is broken and image is missing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org