Thank you, Nick!
IIUC the XLSX raw bytes are in the Package entry of an OLE2 wrapper. What
is the key for the OLE2 wrapper in the PPT? Sorry for missing this...
Have you put your hands on an example that you could share privately?
Happy to look through our regression corpus if I know what exactly to look
for.
Thank you, again!
Cheers,
Tim
On Sat, Oct 10, 2020 at 7:20 AM Nick Burch <[email protected]> wrote:
> On Fri, 9 Oct 2020, Tim Allison wrote:
> > Do you think we should follow up on the Tika side? Do we know if we can
> > handle this?
>
> I thought we did, but checking POIFSContainerDetector I can't actually see
> that case covered....
>
> I think we (Tika) can handle it in a similar way to CompObj
>
> > Over on Stackoverflow <https://stackoverflow.com/q/64269294/685641>
> > there's a user who was getting what they thought was an embedded XSLX
> file
> > out of a PPT, but finding it was an OLE2 wrapper with CompObj and Package
> > entries. The real XLSX was in the Package part. Passing the outer OLE2
> > stream to WorkbookFactory didn't work
>
> The list of entries to search for are in the comments on the question. We
> may actually have a similar file in our corpus we can use to test. I think
> it is triggered when an OOXML file is embedded in a PPT by some older
> versions of PowerPoint, as a compatibility wrapper
>
> Nick
>