Tim Allison created TIKA-4692:
---------------------------------
Summary: Move PPTX and DOCX SAX parsers closer to parity with DOM
based parsers
Key: TIKA-4692
URL: https://issues.apache.org/jira/browse/TIKA-4692
Project: Tika
Issue Type: Task
Reporter: Tim Allison
SAX processing of pptx and docx is more robust in cases where elements are
embedded more than usual. We should work to promote the sax parsers so that we
can have parity with features extracted by the DOM based parsers.
Ideally, we'd move to using sax based as the default in 4.x on a separate
ticket), but we should back port the updates to 3.x as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)