Tim Allison created TIKA-3968:
---------------------------------
Summary: Reconstruct embedded file names from recent docx files
Key: TIKA-3968
URL: https://issues.apache.org/jira/browse/TIKA-3968
Project: Tika
Issue Type: Task
Reporter: Tim Allison
Attachments: testWORD has attachment.docx
I'm starting to see among several users communicating with me privately that
Microsoft has changed their basic behavior for files attached to at least docx
files (possibly pptx and xlsx?). Rather than storing the original file name,
the file associates an EMF file with an attachment. The filename that a human
sees in the application is spelled/painted out in the EMF file, but does NOT
exist in any of the XML.
I'm attaching an example file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)