[
https://issues.apache.org/jira/browse/PDFBOX-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975395#action_12975395
]
Adam Nichols commented on PDFBOX-202:
-------------------------------------
First, I tested ExtractText.main(new String[]
{"C:\\Temp\\PDFBOX-202\\mozambique.pdf"}); and it did not throw any exceptions
with the current HEAD tag (this includes two patches I made today for
protecting against NPE). So this is fixed in the current head tag.
No text is extracted in the txt file, but since Adobe Acrobat Standard 8, this
is expected. It's a corrupt PDF, so there's not much we can do with it, but
it's good that it doesn't throw an exception anymore.
> Error on text extraction: java.lang.IndexOutOfBoundsExceptio
> ------------------------------------------------------------
>
> Key: PDFBOX-202
> URL: https://issues.apache.org/jira/browse/PDFBOX-202
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Priority: Minor
> Fix For: 1.5.0
>
> Attachments: mozambique.pdf
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1565617
> Originally submitted by gagravarr on 2006-09-26 03:30.
> I'm trying to extract text from a pdf file
> (http://www.cifor.cgiar.org/mla/download/publication/mozambique.pdf),
> but I'm getting an IndexOutOfBoundsException on it:
> Exception in thread "main"
> java.lang.IndexOutOfBoundsException: Index: 4, Size: 4
> at
> java.util.ArrayList.RangeCheck(ArrayList.java:546)
> at java.util.ArrayList.get(ArrayList.java:321)
> at
> org.pdfbox.util.operator.Concatenate.process(Concatenate.java:69)
> at
> org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:494)
> at
> org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:207)
> at
> org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
> at
> org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
> at
> org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
> at
> org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
> at
> org.pdfbox.ExtractText.main(ExtractText.java:237)
> I've tried with 0.7.2, and 0.7.3-dev-20060920, and I
> get the same exception from both versions.
> Nick
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.