Tim Allison created TIKA-3968:
---------------------------------

             Summary: Reconstruct embedded file names from recent docx files
                 Key: TIKA-3968
                 URL: https://issues.apache.org/jira/browse/TIKA-3968
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison
         Attachments: testWORD has attachment.docx

I'm starting to see among several users communicating with me privately that 
Microsoft has changed their basic behavior for files attached to at least docx 
files (possibly pptx and xlsx?).  Rather than storing the original file name, 
the file associates an EMF file with an attachment.  The filename that a human 
sees in the application is spelled/painted out in the EMF file, but does NOT 
exist in any of the XML.

I'm attaching an example file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to