[
https://issues.apache.org/jira/browse/PDFBOX-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836466#comment-13836466
]
Hong-Thai Nguyen commented on PDFBOX-1787:
------------------------------------------
{code}
java -jar pdfbox-app-1.8.3.jar ExtractText E:\corrupt_file.pdf
{code}
this command does not end
When using -nonSeq option, here's exception
{code}
ExtractText failed with the following exception:
java.io.IOException: Missing end of file marker '%%EOF'
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.getStartxrefOffset(NonSequentialPDFParser.java:578)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:322)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:702)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1258)
at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:208)
at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
{code}
> pdfbox hangs on a corrupt PDF file
> ----------------------------------
>
> Key: PDFBOX-1787
> URL: https://issues.apache.org/jira/browse/PDFBOX-1787
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.3
> Environment: all
> Reporter: Hong-Thai Nguyen
> Priority: Critical
> Fix For: 1.8.4
>
> Attachments: corrupt_file.pdf
>
>
> pdfbox hangs on command line on attached file.
--
This message was sent by Atlassian JIRA
(v6.1#6144)