[
https://issues.apache.org/jira/browse/PDFBOX-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836523#comment-13836523
]
Hong-Thai Nguyen edited comment on PDFBOX-1787 at 12/2/13 2:01 PM:
-------------------------------------------------------------------
I agree that we can't do anything to extract text's content but what's we
expecting that our pdfbox should stop and report properly when having this kind
of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ?
Text extraction result is the same as current PDFParser ? I'm reading code of
PDFBOX-1104, seem that this parser improve extraction perf by starting
extraction from random page.
Thanks
was (Author: thaichat04):
I agree that we can't do anything to extract text's content but what's we
expecting that our pdfbox should stop and report properly when having this kind
of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ?
Text extraction result is the same as current PDFParser ?
Thanks
> pdfbox hangs on a corrupt PDF file
> ----------------------------------
>
> Key: PDFBOX-1787
> URL: https://issues.apache.org/jira/browse/PDFBOX-1787
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.3
> Environment: windows
> Reporter: Hong-Thai Nguyen
> Attachments: corrupt_file.pdf
>
>
> pdfbox hangs on command line on attached file.
--
This message was sent by Atlassian JIRA
(v6.1#6144)