Edward Ashley created PDFBOX-6209:
-------------------------------------
Summary: Regression in v3.0.7 causes Splitter to extract pages
with text converted to symbols
Key: PDFBOX-6209
URL: https://issues.apache.org/jira/browse/PDFBOX-6209
Project: PDFBox
Issue Type: Bug
Components: Utilities
Affects Versions: 3.0.7 PDFBox
Reporter: Edward Ashley
Attachments: Screenshot 2026-06-10 at 14.03.59.png, letter-redacted.pdf
When splitting pages on certain PDF's the splitter corrupts certain pages, this
is working in version 3.0.6 but not 3.0.7.
Example Code:
{code:java}
@Test
public void testSplitPage() {
try {
try (PDDocument doc = Loader.loadPDF(
new File(
FileSystemView.getFileSystemView().getHomeDirectory()
+ File.separator
+ "input.pdf"))) {
var pages = new Splitter().split(doc);
int count = 0;
for (var page : pages) {
page.save(
FileSystemView.getFileSystemView().getHomeDirectory()
+ File.separator
+ "output-" + count++ + ".pdf");
}
}
} catch (Exception ex) {
log.error("Error splitting PDF: {}", ex.getMessage(), ex);
}
} {code}
I have attached an example PDF this is happening to, and a screenshots of the
corrupt output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]