[ 
https://issues.apache.org/jira/browse/PDFBOX-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053341#comment-17053341
 ] 

Jorge Spinsanti commented on PDFBOX-4780:
-----------------------------------------

Ok [~tilman], but I'm not sure about we should work with these corrupt files.

If I have 10 files to merge and one of them is corrupt, the merge between 10 
files fails. So, we are trying to fast analysis on each file to prevent 
different exceptions thrown by PDFBox code.

Perhaps you can add some validation in PDDocument class and we can invoke 
explicitly. In this case, 
{code}
try (PDDocument pdDocument = PDDocument.load(file)) {
  ...
}
{code}
was not enough.

We try to prevent this issue doing
{code}
  try {
     ...
     for (PDPage page : documentCatalog.getPages()) {
        for (PDAnnotation ann : page.getAnnotations()) {
                ann.getCOSObject().removeItem(COSName.STRUCT_PARENT);
        }
     }
     ...
  } catch (Exception e) {
  ... 
 }
{code}
Make sense?

> Possible issue on 
> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4780
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4780
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.18
>            Reporter: Jorge Spinsanti
>            Priority: Major
>
> When we try to merge several PDFs files, we got the next stacktrace:
> {code:java}
> Caused by: java.io.IOException: Error: Unknown annotation type 
> COSName{ICCBased}
>       at 
> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:172)
>       at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:696)
>       at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:663)
>       at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.appendDocument(PDFMergerUtility.java:801)
>       at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:459)
>       at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:346)
>  {code}
> We cannot details about each PDF involved in merge. Perhaps it is an issue 
> but I'm not sure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to