[ 
https://issues.apache.org/jira/browse/TIKA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685243#comment-17685243
 ] 

Tim Allison commented on TIKA-3968:
-----------------------------------

This comment on a blog post suggest that IconOnly marks the beginning and end 
sequence: 
https://www.codeproject.com/Articles/1307140/Parse-understand-and-demystify-Enhanced-Meta-Files?msg=5895351#xx5895351xx

Maybe take the string from the first comment record after the first IconOnly 
comment record?  There's a blank record and then a variable short record before 
the final IconOnly.

> Reconstruct embedded file names from recent docx files
> ------------------------------------------------------
>
>                 Key: TIKA-3968
>                 URL: https://issues.apache.org/jira/browse/TIKA-3968
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: Microsoft_Word_Document.docx, 
> image-2023-02-06-15-46-05-678.png, image-2023-02-06-15-58-20-443.png, 
> image1-1.emf, image1-2.emf, image1.emf, image2.emf, image3.emf, 
> oleObject1.bin, oleObject2.bin, testWORD has attachment.docx
>
>
> I'm starting to see among several users communicating with me privately that 
> Microsoft has changed their basic behavior for files attached to at least 
> docx files (possibly pptx and xlsx?).  Rather than storing the original file 
> name, the file associates an EMF file with an attachment.  The filename that 
> a human sees in the application is spelled/painted out in the EMF file, but 
> does NOT exist in any of the XML.
> I'm attaching an example file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to