[ 
https://issues.apache.org/jira/browse/PDFBOX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alistair Oldfield updated PDFBOX-5278:
--------------------------------------
    Description: 
I have stumbled across a strange issue with a certain PDF where 
PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to 
fail.

 

I am not at liberty to share the PDF publicly, but am happy to DM the PDF 
privately if it helps.

 

The code to reproduce is pretty straightforward:

 

 
{code:java}
import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;


public class AnnotationsTest {

        
        public static void main(String[] args) throws Exception {

                
                

                try( PDDocument doc = PDDocument.load(new File(args[0]));){

                        for (PDPage page : doc.getPages()) {
                                //this line will cause the doc to not be 
re-iterable in the next block, commenting it out will allow it to pass.
                                page.getAnnotations();

                        }
                        
                        System.out.println("We get here, no problem - not sure 
why we can't re-iterate again...");
                        
                        //doc.getPages() fails.
                        for (PDPage page : doc.getPages()) {
                                //do something
                                
                        }

                } 
        }

{code}
 The Exception:

 

Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but 
found COSName\{Annot}Exception in thread "main" 
java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at 
org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at 
org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at 
AnnotationsTest.main(AnnotationsTest.java:28)

  was:
I have stumbled across a strange issue with a certain PDF where 
PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to 
fail.

 

I am not at liberty to share the PDF publicly, but am happy to DM the PDF 
privately if it helps.

 

The code to reproduce is pretty straightforward:

 

 
{code:java}
import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;


public class AnnotationsTest {

        
        public static void main(String[] args) throws Exception {

                
                

                try( PDDocument doc = PDDocument.load(new File(args[0]));){

                        for (PDPage page : doc.getPages()) {
                                //this line will cause the doc to not be 
re-iterable in the next block, commenting it out will allow it to pass.
                                page.getAnnotations();

                        }
                        
                        System.out.println("We get here, no problem - not sure 
why we can't re-iterate again...");
                        
                        //doc.getPages() fails.
                        for (PDPage page : doc.getPages()) {
                                //do something
                                
                        }

                } 
        }

{code}
 The Exception:

 

Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but 
found COSName\{Annot}Exception in thread "main" 
java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at 
org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at 
org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at 
com.onlinedoctranslator.test.AnnotationsTest.main(AnnotationsTest.java:28)


> PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to 
> fail
> --------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5278
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5278
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.24
>            Reporter: Alistair Oldfield
>            Priority: Major
>
> I have stumbled across a strange issue with a certain PDF where 
> PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to 
> fail.
>  
> I am not at liberty to share the PDF publicly, but am happy to DM the PDF 
> privately if it helps.
>  
> The code to reproduce is pretty straightforward:
>  
>  
> {code:java}
> import java.io.File;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> public class AnnotationsTest {
>       
>       public static void main(String[] args) throws Exception {
>               
>               
>               try( PDDocument doc = PDDocument.load(new File(args[0]));){
>                       for (PDPage page : doc.getPages()) {
>                               //this line will cause the doc to not be 
> re-iterable in the next block, commenting it out will allow it to pass.
>                               page.getAnnotations();
>                       }
>                       
>                       System.out.println("We get here, no problem - not sure 
> why we can't re-iterate again...");
>                       
>                       //doc.getPages() fails.
>                       for (PDPage page : doc.getPages()) {
>                               //do something
>                               
>                       }
>               } 
>       }
> {code}
>  The Exception:
>  
> Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' 
> but found COSName\{Annot}Exception in thread "main" 
> java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at 
> org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at 
> org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) 
> at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) 
> at AnnotationsTest.main(AnnotationsTest.java:28)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to