[ 
https://issues.apache.org/jira/browse/PDFBOX-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836466#comment-13836466
 ] 

Hong-Thai Nguyen commented on PDFBOX-1787:
------------------------------------------

{code}
java -jar pdfbox-app-1.8.3.jar ExtractText E:\corrupt_file.pdf
{code}
this command does not end

When using -nonSeq option, here's exception
{code}
ExtractText failed with the following exception:
java.io.IOException: Missing end of file marker '%%EOF'
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.getStartxrefOffset(NonSequentialPDFParser.java:578)
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:322)
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:702)
        at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1258)
        at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:208)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
        at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
{code}

> pdfbox hangs on a corrupt PDF file
> ----------------------------------
>
>                 Key: PDFBOX-1787
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1787
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.3
>         Environment: all
>            Reporter: Hong-Thai Nguyen
>            Priority: Critical
>             Fix For: 1.8.4
>
>         Attachments: corrupt_file.pdf
>
>
> pdfbox hangs on command line on attached file.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to