[ 
https://issues.apache.org/jira/browse/PDFBOX-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807637#comment-17807637
 ] 

Marcus Korinth commented on PDFBOX-5753:
----------------------------------------

Just to let you know (I might open another ticket) I have kind of a similiar 
problem with fonts...the fonts get changed. I did not have these issues with 
pdfbox 2.0.26.

I have written a test environment which splits N documents and performes 
multiple tests, for example image comparison. The goal was to check if splitted 
pages look like in the original document. Never had issues in that regard with 
pdfbox 2.0.26 (but other problems which required a lot of custom 
parsing/processing to be able to split the original documents at all).

> multipdf.Splitter - Changes color of images in splitted pages
> -------------------------------------------------------------
>
>                 Key: PDFBOX-5753
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5753
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 3.0.1 PDFBox
>         Environment: macOS Sonoma 14.2.1
>            Reporter: Marcus Korinth
>            Priority: Major
>         Attachments: original document.pdf, page 10.pdf, 
> split-with-snapshot-p10.pdf
>
>
> When using the default {{org.apache.pdfbox.multipdf.Splitter}} the color of 
> the splitted pages gets changed.
> That is a problem
> The attachment contains an example. Here one will see that the yellow tone on 
> page 10 is different than the one who got extracted from the original 
> document. See {{original document.pdf}} vs {{page 10.pdf}}.
> The Java code which has been used to extract the pages is:
> {code:java}
> try (final PDDocument document = Loader.loadPDF(new File(inputPath))) {
>     Splitter splitter = new Splitter();
>     List<PDDocument> pages = splitter.split(document);
>     for (int i = 0; i < pages.size(); i++) {
>         final String fileName = OUT_PREFIX + (i + 1) + OUT_SUFFIX;
>         final File file = new File(outputPath, fileName);
>         pages.get(i).save(file);
>     }
> }
> {code}
> The pom:
> {code:xml}
> <dependency>
>     <groupId>org.apache.pdfbox</groupId>
>     <artifactId>pdfbox</artifactId>
>     <version>3.0.1</version>
> </dependency>
> {code}
> Also I am using Java 21.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to