[
https://issues.apache.org/jira/browse/TIKA-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amit Humnabadkar updated TIKA-2441:
-----------------------------------
Attachment: doc001.zip
> Unable to extract text present in a table inside a textbox in MS Word
> ---------------------------------------------------------------------
>
> Key: TIKA-2441
> URL: https://issues.apache.org/jira/browse/TIKA-2441
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.15
> Environment: Windows, Linux, Apache tika 1.15 used with Apache
> Solr-6.6.0
> Reporter: Amit Humnabadkar
> Attachments: doc001.zip
>
>
> Hello,
> I am using Tika-1.15 with Solr-6.6.0 to indexing and searching. This setup
> fails to index text present in a table inside a textbox in a word document.
> A MS Word document contains two words -
> 1. Germany - present in a table inside a textbox
> 2. Africa - present in a textbox
> Germany is not getting indexed while Africa gets indexed successfully. Looks
> like Tika fails to extract the content present in table inside a textbox.
> Please have a look.
> Thanks,
> Amit Humnabadkar
> [^doc001.zip]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)