[ 
https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409024#comment-13409024
 ] 

Nick Burch commented on TIKA-948:
---------------------------------

If someone feels keen, we could add CompObj decoding. When that's there, we 
could try to extract helpful data (such as file type) out. However, it doesn't 
look like it stores the file type as such, just the application the data should 
be passed to, which might well not be specific enough

The Quill96 thing comes from an old works file, from an old bug report 
somewhere. IIRC we didn't get a sample file to go with it, but it might be good 
for someone to work out which mailing list discussion / old issue it's 
associated with to check...
                
> Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc
> -------------------------------------------------------------------------
>
>                 Key: TIKA-948
>                 URL: https://issues.apache.org/jira/browse/TIKA-948
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: EmbeddedPDF.doc, TIKA-948.patch, TIKA-948.patch
>
>
> This is just like TIKA-704, except that issue was for an OOXML Word
> doc but this is for the older Word 97-2003 format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to