Tim Allison created TIKA-4116:
---------------------------------

             Summary: Duplicate macros extracted from some embedded OLE2 
containers
                 Key: TIKA-4116
                 URL: https://issues.apache.org/jira/browse/TIKA-4116
             Project: Tika
          Issue Type: Bug
            Reporter: Tim Allison


In some OLE2 containers with embedded objects, we're calling extract macros 
potentially several times on the same POIFSFileSystem.

An example file is here: 
https://corpora.tika.apache.org/base/docs/govdocs1/527/527356.doc

The embedded {{_1152432709.xls}} has several attachments, including 
{{MBD000000B4.unknown}} and {{MBD0049C388.unknown}} among others.  Each time we 
parse the embedded files, we're calling extractMacros on the same file system: 
{{root.getFileSystem()}}, which takes us back to {{_1152432709.xls}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to