Thomas created TIKA-2932:
----------------------------
Summary: Filter Documents Meta Data
Key: TIKA-2932
URL: https://issues.apache.org/jira/browse/TIKA-2932
Project: Tika
Issue Type: Wish
Components: parser
Affects Versions: 1.22
Reporter: Thomas
Hello!
Is there a way so that I can filter out tags like , *[image: ]* [bookmark] from
the text I get while parsing the Docs? I need it because sometimes the Metadata
does not returns number of words from a document if it contains images or tables
*MetaData*
{"title":"Complete
name,","description":null,"keywords":[],"language":"en","encoding":null,"author":"","generator":"Microsoft
Office Word","pages":0,"words":0 ...
*Text*
[image: ] Certified Translation Certificate of Accuracy Your name here
Translator/Interpreter Translated document: [bookmark: _GoBack]As a translator
for Your Spanish Translation, Inc., I, Your name here, declare that I am a
bilingual translator who is thoroughly familiar with the English and source
language languages. I have translated the attached document to the best of my
knowledge from source language into English and the English text is an accurate
and true translation of the original document presented to the best of my
knowledge and belief. Signed on June 1, 201 Sign here in blue ink Your name
here Professional Translator for Day Translations, Inc. [bookmark: _MailAutoSig]
Please help!
--
This message was sent by Atlassian Jira
(v8.3.2#803003)