Thomas created TIKA-2932:
----------------------------

             Summary: Filter Documents Meta Data
                 Key: TIKA-2932
                 URL: https://issues.apache.org/jira/browse/TIKA-2932
             Project: Tika
          Issue Type: Wish
          Components: parser
    Affects Versions: 1.22
            Reporter: Thomas


Hello!

Is there a way so that I can filter out tags like , *[image: ]* [bookmark] from 
the text I get while parsing the Docs? I need it because sometimes the Metadata 
does not returns number of words from a document if it contains images or tables

*MetaData*

{"title":"Complete 
name,","description":null,"keywords":[],"language":"en","encoding":null,"author":"","generator":"Microsoft
 Office Word","pages":0,"words":0 ...

*Text*

[image: ] Certified Translation Certificate of Accuracy Your name here 
Translator/Interpreter Translated document: [bookmark: _GoBack]As a translator 
for Your Spanish Translation, Inc., I, Your name here, declare that I am a 
bilingual translator who is thoroughly familiar with the English and source 
language languages. I have translated the attached document to the best of my 
knowledge from source language into English and the English text is an accurate 
and true translation of the original document presented to the best of my 
knowledge and belief. Signed on June 1, 201 Sign here in blue ink Your name 
here Professional Translator for Day Translations, Inc. [bookmark: _MailAutoSig]

Please help!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to