[ 
https://issues.apache.org/jira/browse/TIKA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745659#comment-17745659
 ] 

Tim Allison commented on TIKA-4093:
-----------------------------------

Also found a CorelDRAW stream here: 
[https://corpora.tika.apache.org/base/docs/commoncrawl3/AZ/AZG2X4VXB3KIEDT3OVZC4R645KU5VSOF]

 

I was tempted to extract that as the Contents, but the raw CorelDRAW stream 
contains a Corel Draw file, and there are supplementary streams with extra data 
that may be used(?) to render the image.

 

A deep dive on these streams would be useful.

> Deep dive on OLE2 object pools
> ------------------------------
>
>                 Key: TIKA-4093
>                 URL: https://issues.apache.org/jira/browse/TIKA-4093
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Trivial
>         Attachments: 6TJD5TNSDB73QV6XEW46CPR6MSHXRRBN
>
>
> In looking at some OLE2 files, I noticed the attached file.  This has an 
> object pool that we're not properly processing.  We're looking for the 
> "contents" stream, but other streams might be more relevant here... maybe the 
> OLE10Native "Equation Native"? 
> We should look into this at some point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to