[
https://issues.apache.org/jira/browse/PDFBOX-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alistair Oldfield updated PDFBOX-5278:
--------------------------------------
Description:
I have stumbled across a strange issue with a certain PDF where
PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to
fail.
I am not at liberty to share the PDF publicly, but am happy to DM the PDF
privately if it helps.
The code to reproduce is pretty straightforward:
{code:java}
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class AnnotationsTest {
public static void main(String[] args) throws Exception {
try( PDDocument doc = PDDocument.load(new File(args[0]));){
for (PDPage page : doc.getPages()) {
//this line will cause the doc to not be
re-iterable in the next block, commenting it out will allow it to pass.
page.getAnnotations();
}
System.out.println("We get here, no problem - not sure
why we can't re-iterate again...");
//doc.getPages() fails.
for (PDPage page : doc.getPages()) {
//do something
}
}
}
{code}
The Exception:
Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but
found COSName\{Annot}Exception in thread "main"
java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at
org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at
org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at
AnnotationsTest.main(AnnotationsTest.java:28)
was:
I have stumbled across a strange issue with a certain PDF where
PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to
fail.
I am not at liberty to share the PDF publicly, but am happy to DM the PDF
privately if it helps.
The code to reproduce is pretty straightforward:
{code:java}
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class AnnotationsTest {
public static void main(String[] args) throws Exception {
try( PDDocument doc = PDDocument.load(new File(args[0]));){
for (PDPage page : doc.getPages()) {
//this line will cause the doc to not be
re-iterable in the next block, commenting it out will allow it to pass.
page.getAnnotations();
}
System.out.println("We get here, no problem - not sure
why we can't re-iterate again...");
//doc.getPages() fails.
for (PDPage page : doc.getPages()) {
//do something
}
}
}
{code}
The Exception:
Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' but
found COSName\{Annot}Exception in thread "main"
java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at
org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at
org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224) at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172) at
com.onlinedoctranslator.test.AnnotationsTest.main(AnnotationsTest.java:28)
> PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to
> fail
> --------------------------------------------------------------------------------
>
> Key: PDFBOX-5278
> URL: https://issues.apache.org/jira/browse/PDFBOX-5278
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.24
> Reporter: Alistair Oldfield
> Priority: Major
>
> I have stumbled across a strange issue with a certain PDF where
> PDPage.getAnnotations() causes subsequent calls to PDDocument.getPages() to
> fail.
>
> I am not at liberty to share the PDF publicly, but am happy to DM the PDF
> privately if it helps.
>
> The code to reproduce is pretty straightforward:
>
>
> {code:java}
> import java.io.File;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> public class AnnotationsTest {
>
> public static void main(String[] args) throws Exception {
>
>
> try( PDDocument doc = PDDocument.load(new File(args[0]));){
> for (PDPage page : doc.getPages()) {
> //this line will cause the doc to not be
> re-iterable in the next block, commenting it out will allow it to pass.
> page.getAnnotations();
> }
>
> System.out.println("We get here, no problem - not sure
> why we can't re-iterate again...");
>
> //doc.getPages() fails.
> for (PDPage page : doc.getPages()) {
> //do something
>
> }
> }
> }
> {code}
> The Exception:
>
> Exception in thread "main" java.lang.IllegalStateException: Expected 'Page'
> but found COSName\{Annot}Exception in thread "main"
> java.lang.IllegalStateException: Expected 'Page' but found COSName\{Annot} at
> org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:266) at
> org.apache.pdfbox.pdmodel.PDPageTree.access$400(PDPageTree.java:43) at
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:224)
> at
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:172)
> at AnnotationsTest.main(AnnotationsTest.java:28)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]