[ 
https://issues.apache.org/jira/browse/TIKA-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison reassigned TIKA-3157:
---------------------------------

    Assignee: Tim Allison

> Missing content from .docx file with hyperlinked shape
> ------------------------------------------------------
>
>                 Key: TIKA-3157
>                 URL: https://issues.apache.org/jira/browse/TIKA-3157
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Robert Kaulbach
>            Assignee: Tim Allison
>            Priority: Minor
>
> The attached .docx file was created in MS Office, simply drew a rectangle and 
> then added a hyperlink to it. While the hyperlink doesn't show inside 
> LibreOffice, it's still there and clickable when opened with MS Office.
> When parsing with Tika, the hyperlink attached to the shape is nowhere to be 
> found in the output. Enabling all Office/OOXML parse options in the context 
> has not helped.
>  
> When debugging, I can see the "a:hlinkClick" tag with the link inside is 
> being skipped at 
> org/apache/tika/parser/microsoft/ooxml/OOXMLWordAndPowerPointTextHandler.java 
> in the StartElement method, because "inACChoiceDepth" is greater than 0.
> And then the fallback tag, which separately has the link inside a "v:rect" 
> tag, doesn't seem to get processed and doesn't save the link content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to