[jira] [Updated] (PDFBOX-6209) Regression in v3.0.7 causes Splitter to extract pages with text converted to symbols

Edward Ashley (Jira) Wed, 10 Jun 2026 06:06:05 -0700


     [ 
https://issues.apache.org/jira/browse/PDFBOX-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Edward Ashley updated PDFBOX-6209:
----------------------------------
    Description: 
When splitting pages on certain PDF's the splitter corrupts certain pages, this 
is working in version 3.0.6 but not 3.0.7.

Example Code:
{code:java}
@Test
public void testSplitPage() {
    try {
        var inputFile = new 
ClassPathResource("/letter-redacted.pdf").getContentAsByteArray();
        try (PDDocument doc = Loader.loadPDF(inputFile)) {
            var pages = new Splitter().split(doc);
            int count = 0;
            for (var page : pages) {
                page.save(
                        FileSystemView.getFileSystemView().getHomeDirectory()
                                + File.separator
                                + "Downloads/output-" + count++ + ".pdf");
            }
        }
    } catch (Exception ex) {
        log.error("Error splitting PDF: {}", ex.getMessage(), ex);
    }
}{code}
I have attached an example PDF this is happening to, and a screenshots of the 
corrupt output.

  was:
When splitting pages on certain PDF's the splitter corrupts certain pages, this 
is working in version 3.0.6 but not 3.0.7.

Example Code:
{code:java}
@Test
public void testSplitPage() {
    try {
        try (PDDocument doc = Loader.loadPDF(
                new File(
                        FileSystemView.getFileSystemView().getHomeDirectory()
                                + File.separator
                                + "input.pdf"))) {
            var pages = new Splitter().split(doc);
            int count = 0;
            for (var page : pages) {
                page.save(
                        FileSystemView.getFileSystemView().getHomeDirectory()
                                + File.separator
                                + "output-" + count++ + ".pdf");
            }
        }
    } catch (Exception ex) {
        log.error("Error splitting PDF: {}", ex.getMessage(), ex);
    }
} {code}
I have attached an example PDF this is happening to, and a screenshots of the 
corrupt output.


> Regression in v3.0.7 causes Splitter to extract pages with text converted to 
> symbols
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-6209
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6209
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 3.0.7 PDFBox
>            Reporter: Edward Ashley
>            Priority: Minor
>         Attachments: Screenshot 2026-06-10 at 14.03.59.png, 
> letter-redacted.pdf
>
>
> When splitting pages on certain PDF's the splitter corrupts certain pages, 
> this is working in version 3.0.6 but not 3.0.7.
> Example Code:
> {code:java}
> @Test
> public void testSplitPage() {
>     try {
>         var inputFile = new 
> ClassPathResource("/letter-redacted.pdf").getContentAsByteArray();
>         try (PDDocument doc = Loader.loadPDF(inputFile)) {
>             var pages = new Splitter().split(doc);
>             int count = 0;
>             for (var page : pages) {
>                 page.save(
>                         FileSystemView.getFileSystemView().getHomeDirectory()
>                                 + File.separator
>                                 + "Downloads/output-" + count++ + ".pdf");
>             }
>         }
>     } catch (Exception ex) {
>         log.error("Error splitting PDF: {}", ex.getMessage(), ex);
>     }
> }{code}
> I have attached an example PDF this is happening to, and a screenshots of the 
> corrupt output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-6209) Regression in v3.0.7 causes Splitter to extract pages with text converted to symbols

Reply via email to