On Fri, 9 Oct 2020, Tim Allison wrote:
Do you think we should follow up on the Tika side?  Do we know if we can
handle this?

I thought we did, but checking POIFSContainerDetector I can't actually see that case covered....

I think we (Tika) can handle it in a similar way to CompObj

Over on Stackoverflow <https://stackoverflow.com/q/64269294/685641>
there's a user who was getting what they thought was an embedded XSLX file
out of a PPT, but finding it was an OLE2 wrapper with CompObj and Package
entries. The real XLSX was in the Package part. Passing the outer OLE2
stream to WorkbookFactory didn't work

The list of entries to search for are in the comments on the question. We may actually have a similar file in our corpus we can use to test. I think it is triggered when an OOXML file is embedded in a PPT by some older versions of PowerPoint, as a compatibility wrapper

Nick

Reply via email to