Nick Burch created TIKA-2346:
--------------------------------
Summary: Allow Office format parsers to exclude parsing shapes
Key: TIKA-2346
URL: https://issues.apache.org/jira/browse/TIKA-2346
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.14
Reporter: Nick Burch
Fix For: 1.15
The Office format parsers support including or excluding of deleted text and
moved text. It would be good to also support something similar for shape-based
text, though probably not for PPT / PPTX as that's almost all shape-based!
(This has been done hackily in the Alfresco fork of Tika at
https://github.com/Alfresco/tika/commit/32aca3fd96816ad49b869a82c9ba0f02265f8744
but would be good to do properly)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)