[ 
https://issues.apache.org/jira/browse/PDFBOX-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836523#comment-13836523
 ] 

Hong-Thai Nguyen edited comment on PDFBOX-1787 at 12/2/13 2:01 PM:
-------------------------------------------------------------------

I agree that we can't do anything to extract text's content but what's we 
expecting that our pdfbox should stop and report properly when having this kind 
of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ? 
Text extraction result is the same as current PDFParser ? I'm reading code of 
PDFBOX-1104, seem that this parser improve extraction perf by starting 
extraction from random page.

Thanks


was (Author: thaichat04):
I agree that we can't do anything to extract text's content but what's we 
expecting that our pdfbox should stop and report properly when having this kind 
of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ? 
Text extraction result is the same as current PDFParser ?

Thanks

> pdfbox hangs on a corrupt PDF file
> ----------------------------------
>
>                 Key: PDFBOX-1787
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1787
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.3
>         Environment: windows
>            Reporter: Hong-Thai Nguyen
>         Attachments: corrupt_file.pdf
>
>
> pdfbox hangs on command line on attached file.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to