Tim Allison created TIKA-1948:
---------------------------------
Summary: Catch exceptions per page in PDFParser
Key: TIKA-1948
URL: https://issues.apache.org/jira/browse/TIKA-1948
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Minor
In a discussion with [~tilman] somewhere(???), I think he observed that we
weren't doing a try/catch for each page. If there's an exception in an early
page, it might still be possible to extract text from later pages in a
problematic PDF.
With very minimal modifications we could add a try/catch per page, store the
caught exceptions, and then throw the first caught exception after the parse
finishes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)