[ 
https://issues.apache.org/jira/browse/PDFBOX-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034433#comment-15034433
 ] 

Jim deVos commented on PDFBOX-3142:
-----------------------------------

Andreas - thanks for your reply. I'll run these source documents through a pdf 
validator to see what it finds.  Individually they open just fine (i.e. no 
blank pages) in various pdf viewers, but I suspect that these viewers are 
pretty forgiving w/ non-compliant files.   On that note, it would be  nice to 
know of a way to anticipate if the file will cause these issues before 
attempting to merge it with a coverpage.   At the moment all I see is the 
aforementioned error  message in the log, but I don't see a way to interrogate 
the parser to see if it has issues w/ the file.

As for v2,  that's a good suggestion. I'll rewrite my test for 2.0.0 and report 
the results.

> PDFMergerUtility with scratch file generates result with blank pages for 
> certain source files.
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3142
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3142
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.8.10
>         Environment: Ubuntu 14.04.3, java 1.8.0_66
>            Reporter: Jim deVos
>
> My team uses PDFMergerUtility to attach cover pages to various pdfs .   We 
> recently we tried utilizing a scratch file (e.g. 
> PDFMergerUtility.mergeDocumentsNonSeq())  to cut down on the amount of RAM we 
> are using. This approach works for the majority of pdf's in our system, but 
> some files cause the merger utility to generate resultant pdf's with a blank 
> page.  Specifically, the result pdf contains a blank page after the coverpage 
> instead of the first page of the second document sent to merger utility.
> Whenever this problem occurs, we see the following line in our logs:
> {{org.apache.pdfbox.pdfparser.NonSequentialPDFParser - Can't find the object 
> 52 0 (origin offset 7187557)}}
> I'll try to attach/link an example pdf soon, but currently I don't have 
> permission to redistribute any files that exhibit the problem.  However,  
> here's a simple snippet that replicates the problem - it's pretty 
> straightforward.
> {code}
>     @Test
>     public void testMergeNonSeq() throws IOException, COSVisitorException {
>         destinationPdf = new File(TMP_FOLDER, "result-nonseq.pdf");
>         PDFMergerUtility ut = new PDFMergerUtility();
>         RandomAccess ram = new 
> RandomAccessFile(File.createTempFile("mergeram", ".bin"), "rw");
>         ut.addSource(coverpagePdf);
>         ut.addSource(documentPdf);
>         ut.setDestinationFileName(destinationPdf.getCanonicalPath());
>         ut.mergeDocumentsNonSeq(ram);  
>         
>         //the only automated way we have to tell that something went wrong is 
> to check the size of the result
>         assertThat("destination pdf should be larger than the original pdf", 
> destinationPdf.length(), is( greaterThan(documentPdf.length())));
>     }
> {code}
> Note we only see this problem with PDFMergerUtility.mergeDocumentsNonSeq().  
> Using PDFMergerUtility.mergeDocuments() does not exhibit any problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to