Tim Allison created TIKA-4226:
---------------------------------

             Summary: Use jsoup for epubs
                 Key: TIKA-4226
                 URL: https://issues.apache.org/jira/browse/TIKA-4226
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


We're getting quite a few xml exceptions when parsing epubs (roughly 1k out of 
8k total). We should use Jsoup to handle contents of epubs more robustly.

This is a proposal for 3.x. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to