[
https://issues.apache.org/jira/browse/TIKA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684890#comment-17684890
]
Tim Allison commented on TIKA-3968:
-----------------------------------
This is an example of an object within a run. The <v:shape/> contains the
reference to the emf file (rId6) that paints the file name, and the
<o:OLEObject/> contains a reference to the actual data (rId7).
{code:xml}
<w:r w:rsidR="004445EA">
<w:object w14:anchorId="44A186C5" w:dxaOrig="1508" w:dyaOrig="983">
<v:shape id="_x0000_i1027" o:ole="" style="width:75.5pt;height:49pt"
type="#_x0000_t75">
<v:imagedata o:title="" r:id="rId6"/>
</v:shape>
<o:OLEObject DrawAspect="Icon" ObjectID="_1731340470"
ProgID="AcroExch.Document.DC" ShapeID="_x0000_i1027" Type="Embed" r:id="rId7"/>
</w:object>
</w:r>
{code}
> Reconstruct embedded file names from recent docx files
> ------------------------------------------------------
>
> Key: TIKA-3968
> URL: https://issues.apache.org/jira/browse/TIKA-3968
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: testWORD has attachment.docx
>
>
> I'm starting to see among several users communicating with me privately that
> Microsoft has changed their basic behavior for files attached to at least
> docx files (possibly pptx and xlsx?). Rather than storing the original file
> name, the file associates an EMF file with an attachment. The filename that
> a human sees in the application is spelled/painted out in the EMF file, but
> does NOT exist in any of the XML.
> I'm attaching an example file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)