[ 
https://issues.apache.org/jira/browse/TIKA-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2026:
------------------------------
    Description: When some files (e.g. pdfs) are embedded in XLSX, PPT and 
PPTX, they are wrapped in an OLE compobj.  In TIKA-704, we added handling for 
these types of embedded files in DOC/DOCX files.  We need to make a few 
modifications to extract these in XLSX, PPT and PPTX.  (was: When some files 
(e.g. pdfs) are embedded in PPT and PPTX, they are wrapped in an OLE compobj.  
It would be nice if we could extract the actual files from these wrappers.)

> Handle OLE 2.0 embedded non-Office document in PPT/X and XLSX
> -------------------------------------------------------------
>
>                 Key: TIKA-2026
>                 URL: https://issues.apache.org/jira/browse/TIKA-2026
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>         Attachments: oleObject1.bin, testEmbedded3.pptx
>
>
> When some files (e.g. pdfs) are embedded in XLSX, PPT and PPTX, they are 
> wrapped in an OLE compobj.  In TIKA-704, we added handling for these types of 
> embedded files in DOC/DOCX files.  We need to make a few modifications to 
> extract these in XLSX, PPT and PPTX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to