[ 
https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574779#comment-17574779
 ] 

Tim Allison commented on TIKA-3827:
-----------------------------------

It looks like the embedded files do not have bmp headers.  Are they just the 
raw bytes after what would be the header?  If you extract them (attached), are 
you able to open them?

Magic isn't working because they don't have headers.  I'm working on adding a 
mime type hint if \wbitmap is encountered.

> Word Document extracted mpga file extension instead of bitmap 
> --------------------------------------------------------------
>
>                 Key: TIKA-3827
>                 URL: https://issues.apache.org/jira/browse/TIKA-3827
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tika User
>            Priority: Major
>         Attachments: example.DOC, file_1.bmp, file_2.bmp
>
>
> When tried to parser the .doc document it is extracted two mpga files which 
> can't be open to play. We are suspecting they should be bitmap image files. 
> The Tika version we are using is 2.4.1.
> [^example.DOC]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to