[
https://issues.apache.org/jira/browse/TIKA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-989:
------------------------------------
Attachment: TIKA-989.patch
New patch ... I think it's ready. Instead of hardwiring the relationship ID
into the suggested embedded RESOURCE_NAME, I created a new
TikaMetadataKeys.EMBEDDED_RELATIONSHIP_ID which I set in the Metadata. And I
fixed TikaCLI -z to prefix the filename it writes each embedded file to, with
the relationship ID.
> We don't extract a placeholder for documents embedded in a Word OOXML (.docx)
> document
> --------------------------------------------------------------------------------------
>
> Key: TIKA-989
> URL: https://issues.apache.org/jira/browse/TIKA-989
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 1.3
>
> Attachments: TIKA-989.patch, TIKA-989.patch
>
>
> In TIKA-956 we fixed the Word parser so that at the point where an embedded
> document appears, we output a <div class="embedded" id="_XXX"/> tag.
> It would be nice to do this for documents embedded in OOXML documents too.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira