Edward Ashley created PDFBOX-6209:
-------------------------------------

             Summary: Regression in v3.0.7 causes Splitter to extract pages 
with text converted to symbols
                 Key: PDFBOX-6209
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6209
             Project: PDFBox
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 3.0.7 PDFBox
            Reporter: Edward Ashley
         Attachments: Screenshot 2026-06-10 at 14.03.59.png, letter-redacted.pdf

When splitting pages on certain PDF's the splitter corrupts certain pages, this 
is working in version 3.0.6 but not 3.0.7.

Example Code:
{code:java}
@Test
public void testSplitPage() {
    try {
        try (PDDocument doc = Loader.loadPDF(
                new File(
                        FileSystemView.getFileSystemView().getHomeDirectory()
                                + File.separator
                                + "input.pdf"))) {
            var pages = new Splitter().split(doc);
            int count = 0;
            for (var page : pages) {
                page.save(
                        FileSystemView.getFileSystemView().getHomeDirectory()
                                + File.separator
                                + "output-" + count++ + ".pdf");
            }
        }
    } catch (Exception ex) {
        log.error("Error splitting PDF: {}", ex.getMessage(), ex);
    }
} {code}
I have attached an example PDF this is happening to, and a screenshots of the 
corrupt output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to