[ 
https://issues.apache.org/jira/browse/TIKA-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Humnabadkar updated TIKA-2441:
-----------------------------------
    Attachment: doc001.zip

> Unable to extract text present in a table inside a textbox in MS Word
> ---------------------------------------------------------------------
>
>                 Key: TIKA-2441
>                 URL: https://issues.apache.org/jira/browse/TIKA-2441
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>         Environment: Windows, Linux, Apache tika 1.15 used with Apache 
> Solr-6.6.0
>            Reporter: Amit Humnabadkar
>         Attachments: doc001.zip
>
>
> Hello,
> I am using Tika-1.15 with Solr-6.6.0 to indexing and searching. This setup 
> fails to index text present in a table inside a textbox in a word document.
> A MS Word document contains two words - 
> 1. Germany - present in a table inside a textbox
> 2. Africa - present in a textbox
> Germany is not getting indexed while Africa gets indexed successfully. Looks 
> like Tika fails to extract the content present in table inside a textbox.
> Please have a look.
> Thanks,
> Amit Humnabadkar
> [^doc001.zip]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to