Amit Humnabadkar created TIKA-2441:
--------------------------------------
Summary: Unable to extract text present in a table inside a
textbox in MS Word
Key: TIKA-2441
URL: https://issues.apache.org/jira/browse/TIKA-2441
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.15
Environment: Windows, Linux, Apache tika 1.15 used with Apache
Solr-6.6.0
Reporter: Amit Humnabadkar
Hello,
I am using Tika-1.15 with Solr-6.6.0 to indexing and searching. This setup
fails to index text present in a table inside a textbox in a word document.
A MS Word document contains two words -
1. Germany - present in a table inside a textbox
2. Africa - present in a textbox
Germany is not getting indexed while Africa gets indexed successfully. Looks
like Tika fails to extract the content present in table inside a textbox.
Please have a look.
Thanks,
Amit Humnabadkar
[^doc001.zip]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)