[
https://issues.apache.org/jira/browse/TIKA-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2026:
------------------------------
Description: When some files (e.g. pdfs) are embedded in XLSX, PPT and
PPTX, they are wrapped in an OLE compobj. In TIKA-704, we added handling for
these types of embedded files in DOC/DOCX files. We need to make a few
modifications to extract these in XLSX, PPT and PPTX. (was: When some files
(e.g. pdfs) are embedded in PPT and PPTX, they are wrapped in an OLE compobj.
It would be nice if we could extract the actual files from these wrappers.)
> Handle OLE 2.0 embedded non-Office document in PPT/X and XLSX
> -------------------------------------------------------------
>
> Key: TIKA-2026
> URL: https://issues.apache.org/jira/browse/TIKA-2026
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Attachments: oleObject1.bin, testEmbedded3.pptx
>
>
> When some files (e.g. pdfs) are embedded in XLSX, PPT and PPTX, they are
> wrapped in an OLE compobj. In TIKA-704, we added handling for these types of
> embedded files in DOC/DOCX files. We need to make a few modifications to
> extract these in XLSX, PPT and PPTX.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)