[jira] [Commented] (TIKA-1317) Tika does not read text from Cover Pages and Tables Of Content of DOCX documents

Nick Burch (JIRA) Tue, 03 Jun 2014 07:17:26 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016545#comment-14016545
 ]


Nick Burch commented on TIKA-1317:
----------------------------------

Any chance you could unzip one of these files (.docx is a zip of xml files), 
and identify which part(s) of the file contain the text you're interested in 
(if any)? That will help us work out how much work it'll be to implement

> Tika does not read text from Cover Pages and Tables Of Content of DOCX 
> documents
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-1317
>                 URL: https://issues.apache.org/jira/browse/TIKA-1317
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Vladimir Glina
>         Attachments: docx_alltextoncover.docx, docx_cover.docx, 
> docx_sdtintable.docx
>
>
> Currently, Tika does not read text from Cover Pages and Tables Of Content of 
> DOCX documents. Examples of documents are attached. 
> To process documents, I used the standalone Tika-App utility, 
> tika-app-1.5.jar. I tried both specifying files to be processed in the 
> command line and selecting them from the utility menu.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (TIKA-1317) Tika does not read text from Cover Pages and Tables Of Content of DOCX documents

Reply via email to