David A. Patterson created TIKA-1005:
----------------------------------------

             Summary: In Microsoft Office Word 2010 documents, text inside a 
textbox is not extracted/parsed out.
                 Key: TIKA-1005
                 URL: https://issues.apache.org/jira/browse/TIKA-1005
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.2
         Environment: Windows 7, Windows Server 2008, Windows Server 2008 R2 
(32bit and 64bit each)
            Reporter: David A. Patterson


Text inside a textbox, which itself can be in the body, the header or the 
footer, is not extracted using any type of parser (including AutoDetectParser) 
in combination with any type of ContentHandler.  This is NOT a duplicate of 
TIKA-904.  This specifically concerns the .docx file format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to